LANCZOS AND GOLUB-KAHAN REDUCTION METHODS
APPLIED TO ILL-POSED PROBLEMS
A dissertation submitted to
Kent State University in partial
fulfillment of the requirements for the
degree of Doctor of Philosophy
by
Enyinda N. Onunwor
May, 2018 Dissertation written by Enyinda N. Onunwor A.A., Cuyahoga Community College, 1998 B.S., Youngstown State University, 2001 B.S., Youngstown State University, 2001 M.S., Youngstown State University, 2003 M.A., Kent State University, 2011 Ph.D., Kent State University, 2018
Approved by , Chair, Doctoral Dissertation Committee Lothar Reichel
, Member, Doctoral Dissertation Committee Jing Li
, Member, Doctoral Dissertation Committee Jun Li
, Member, Outside Discipline Arden Ruttan
, Member, Graduate Faculty Representative Arvind Bansal
Accepted by , Chair, Department of Mathematical Sciences Andrew Tonge
, Dean, College of Arts and Sciences James L. Blank TABLE OF CONTENTS
LIST OF FIGURES ...... v
LIST OF TABLES ...... vii
ACKNOWLEDGEMENTS ...... x
NOTATION ...... xii
1 Introduction ...... 1
1.1 Overview ...... 1
1.2 Regularization methods ...... 2
1.2.1 Truncated singular value decomposition (TSVD) ...... 2
1.2.2 Truncated eigenvalue decomposition (TEVD) ...... 4
1.2.3 Tikhonov regularization ...... 5
1.2.4 Regularization parameter: the discrepancy principle ...... 7
1.3 Krylov subspace methods ...... 8
1.3.1 The Arnoldi method ...... 9
1.3.2 The symmetric Lanczos process ...... 11
1.3.3 Golub-Kahan bidiagonalization ...... 13
1.3.4 Block Krylov methods ...... 15
1.4 The test problems ...... 16
1.4.1 Descriptions of the test problems ...... 16
2 Reduction methods applied to discrete ill-posed problems ...... 20
2.1 Introduction ...... 20
iii 2.2 Application of the symmetric Lanczos method ...... 21
2.3 Application of the Golub–Kahan reduction method ...... 29
2.4 Computed examples ...... 31
2.5 Conclusion ...... 42
3 Computation of a truncated SVD of a large linear discrete ill-posed problem ...... 44
3.1 Introduction ...... 44
3.2 Symmetric linear discrete ill-posed problems ...... 45
3.3 Nonsymmetric linear discrete ill-posed problems ...... 47
3.4 Computed examples ...... 48
3.5 Conclusion ...... 61
4 Solution methods for linear discrete ill-posed problems for color image restoration ...... 62
4.1 Solution by partial block Golub–Kahan bidiagonalization ...... 66
4.2 The GGKB method and Gauss-type quadrature ...... 72
4.3 Golub–Kahan bidiagonalization for problems with multiple right-hand sides ...... 77
4.4 Computed examples ...... 81
4.5 Conclusion ...... 87
BIBLIOGRAPHY ...... 88
iv LIST OF FIGURES
1 Behavior of the bounds (2.2.1) (left), (2.2.7) (center), and (2.3.1) (right), with respect to the
iteration index `. The first test matrix is symmetric positive definite, the second is symmetric
indefinite, and the third is unsymmetric. The left-hand side of each inequality is represented
by crosses, the right-hand side by circles...... 31
2 The graphs in the left column display the relative error Rλ,k between the eigenvalues of
the symmetric test problems, and the corresponding Ritz values generated by the Lanczos
process. The right column shows the behavior of Rσˆ ,k for the unsymmetric problems; see
(2.4.1) and (2.4.3)...... 33
3 Distance between the subspace spanned by the first dk/3e eigenvectors (resp. singular vec-
tors) of the symmetric (resp. nonsymmetric) test problems, and the subspace spanned by the
corresponding Lanczos (resp. Golub–Kahan) vectors; see (2.4.2) and (2.4.4)...... 35
T (2) 4 Distance kVk,iVn−ik, i = 1,2,...,k, between the subspace spanned by the first i eigenvectors
of the Foxgood (left) and Shaw (right) matrices, and the subspace spanned by the corre-
sponding i Ritz vectors at iteration k = 10...... 36
5 Distance between the subspace spanned by the first dk/2e eigenvectors (resp. singular vec-
tors) of selected symmetric (resp. nonsymmetric) test problems and the subspace spanned
by the corresponding Lanczos (resp. Golub–Kahan) vectors. The index ` ranges from 1 to
either the dimension of the matrix (n = 200) or to the iteration where there is a breakdown
in the factorization process...... 37
T (2) T (2) 6 Distance max{kVk,iVn−ik,kUk,iUn−ik}, i = 1,2,...,k, between the subspace spanned by the
first i singular vectors of the Heat (left) and Tomo (right) matrices and the subspace spanned
by the corresponding i Golub–Kahan vectors at iteration k = 100...... 38
v 7 The first four LSQR solutions to the Baart test problem (thin lines) are compared to the
corresponding TSVD solutions (dashed lines) and to the exact solution (thick line). The size
of the problem is n = 200, the noise level is δ = 10−4. The thin and dashed lines are very
close...... 41
8 Convergence history for the LSQR and TSVD solutions to the Tomo example of size n =
−2 225, with noise level δ = 10 . The error ELSQR has a minimum at k = 66, while ETSVD is
minimal for k = 215...... 42
9 Solution by LSQR and TSVD to the Tomo example of size n = 225, with noise level
δ = 10−2: exact solution (top left), optimal LSQR solution (top right), TSVD solution cor-
responding to the same truncation parameter (bottom left), optimal TSVD solution (bottom
right)...... 43
1 Example 2: Original image (left), blurred and noisy image (right) ...... 85
2 Example 2: Restored image by Algorithm 5 (left), and restored image by Algorithm 6 (right). 85
3 Example 3: Cross-channel blurred and noisy image (left), restored image by Algorithm 6
(right)...... 86
vi LIST OF TABLES
1 Solution of symmetric linear systems: the errors ELanczos and ETEIG are optimal for truncated
Lanczos iteration and truncated eigenvalue decomposition. The corresponding truncation
parameters are denoted by kLanczos and kTEIG. Three noise levels δ are considered; ` denotes
the number of Lanczos iterations performed...... 38
2 Solution of nonsymmetric linear systems: the errors ELSQR and ETSVD are optimal for LSQR
and TSVD. The corresponding truncation parameters are denoted by kLSQR and kTSVD.
Three noise levels are considered; ` denotes the number of Golub–Kahan iterations performed. 40
1 foxgood test problem...... 49
2 shaw test problem...... 50
3 shaw test problem...... 51
4 phillips test problem...... 52
5 baart test problem...... 53
6 baart test problem...... 53
7 Inverse Laplace transform test problem...... 54
8 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial
vector for the first Golub–Kahan bidiagonalization computed by irbla is a unit random vector. 54
9 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial
vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk...... 54
10 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−4. The initial
vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk...... 55
11 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−6. The initial
vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk...... 55
12 Relative errors and number of matrix-vector product evaluations, δ˜ = 10−2...... 60
vii 13 Relative errors and number of matrix-vector product evaluations, δ˜ = 10−4...... 60
14 Relative errors and number of matrix-vector product evaluations, δ˜ = 10−6...... 61
1 Results for the phillips test problem ...... 82
2 Results for the baart test problem ...... 83
3 Results for the shaw test problem ...... 83
4 Results for Example 2 ...... 84
viii To Olivia and Kristof
ix ACKNOWLEDGEMENTS
This work would not have been possible without the wisdom, support, and tireless assistance of my advisor,
Lothar Reichel. I genuinely appreciate both his patience with me and the guidance he has given me over the years. His invaluable encouragement and counsel have been critical in facilitating the progress I have made to this point. He has truly been a blessing, and he has made a positive impact on my life.
In addition, I extend my undying gratitude to my committee: Jing Li, Jun Li, Arden Ruttan, and Arvind
Bansal. I am tremendously indebted to them for their collective time, effort, and direction.
I would be remiss if I failed to recognize the important contributions made by the following collaborators:
Silvia Gazzola, Giuseppe Rodriguez, Mohamed El Guide, Abdeslem Bentbib, and Khalide Jbilou. A special thanks to Xuebo Yu for helping me debug my codes and for his valuable input.
I honor the memory of my parents, HRH Sir Wobo Weli Onunwor and Dame Nchelem Onunwor. Their legacy of love, strength, determination, support, and faith imbued me with the courage I needed to achieve this objective, and they will forever endure in my spirit and in my work. My sister, Chisa, is one of the most brilliant people I know; her fortitude and determination are unmatched, and I am inspired by her integrity and work ethic. My oldest brother, HRH Nyema Onunwor, sets the example for the rest of us; he helps us maintain a calm demeanor in the face of the challenges we encounter and remains a constant voice of reason.
I offer my deep respect and admiration to my other siblings, Rommy, Acho, and Emenike, for helping me maintain my sanity through this process. Their stimulating conversations and the familial communion we share sustained and comforted me when I was in need of a respite during challenging moments. My thanks to
Dike Echendu for is wisdom and advice. Special thanks to two of my closest friends, Dennis Frank-Ito and
Ian Miller for their mathematical insights and constant encouragement. My cousins Anderson, Blessing,
Charles, Mary-Ann, and Gloria are like siblings to me, and their parents, Dr. Albert and Ezinne Charity
Nnewihe, have acted as my parental figures. I will be eternally grateful to them for their emotional support
x and loving guidance. Thanks to Chiso Obiandu for his pep talks and for pushing the right motivational buttons. Special thanks to Dorothy Acheru Agbude, who ensured that this dissertation came to fruition.
I am glad that the battle is over. I extend my thanks to the remainder of my family and friends for their unconditional love and support.
Finally, I offer my deepest gratitude to my caring, loving, and supportive wife, Kinga. Your consistent encouragement, unending patience, and unflagging faith in me through the rough times have sustained me more than words can express. Thank you so much!
xi NOTATION
Unless stated otherwise, the following notation will be used throughout this dissertation. Standard notation
is used whenever possible.
A an m × n matrix
In n × n identity matrix
k · k the Euclidean vector norm, or the induced operator norm
n tr(A) the trace of an n × n matrix A is the summation of the diagonal entries, tr(A) = ∑ aii i=1
T 1/2 kAkF the Frobenius norm of A defined by kAkF = tr(A A)
hu,vi the inner product between vectors u and v
xtrue the exact but unknown true solution
btrue the exact data
e error or noise vector, i.e. the perturbation in the data
e(i) an error-vector in a problem with multiple right-hand sides
ei the i−th standard basis vector of appropriate dimension
AT the transpose of A
A∗ the Hermitian conjugate or Hermitian adjoint of A
A† the Moore-Penrose pseudoinverse of A
A ⊗ B the Kronecker product of matrices A and B
Ai, j the leading principal i × j submatrix of A
xii R(·) the range or column space
N (·) the nullspace
κ(·) the condition number
λ regularization parameter
λi an eigenvalue
A = W ΛW ∗ The spectral factorization of the matrix A = AT where
n×n W = [w1,w2,...,wn] ∈ R is orthogonal
n×n Λ = diag[λ1,λ2,...,λn] ∈ R , |λ1| ≥ |λ2| ≥ ··· ≥ |λn| ≥ 0
SVD the singular value decomposition (SVD) of a matrix, A ∈ Rm×n, m ≥ n, is a factorization, A =
Uˆ ΣˆVˆ T , where
Uˆ ∈ Rm×m is orthogonal
Vˆ ∈ Rn×n is orthogonal, and
m×n Σˆ = diag[σˆ1,...,σˆn] ∈ R , σˆ1 ≥ ... ≥ σˆn ≥ 0,
GSVD the generalized singular value decomposition (GSVD) of the matrix pair {A,B}, with A ∈ Rm×n
and B ∈ Rp×n, satisfying m ≥ n ≥ p, are factorizations of the form A = UΣX and B = VMX, where
m×m T U ∈ R with U U = Im
p×p T V ∈ R with V V = Ip
X ∈ Rn×n is nonsingular
m×n Σ = diag[σ1,···,σp,1,···,1] ∈ R
p×n M = [diag[µ1,···, µp],0,···,0] ∈ R , and
2 2 σi + µi = 1, for 1 ≤ i ≤ p
xiii L the p × n regularization matrix
L1 Upper bidiagonal regularization matrix, the scaled finite difference approximations of the first
derivative operator with first row removed,
1 −1 1 −1 (n−1)×n L1 = ∈ R .. .. . . 1 −1
L2 Upper bidiagonal regularization matrix, the scaled finite difference approximations of the first
derivative operator with first row removed,
−1 2 −1 −1 2 −1 (n−2)×n L2 = ∈ R ...... . . . −1 2 −1
xiv CHAPTER 1
Introduction
1.1 Overview
We are concerned with the solution of large least-squares problems
(1.1.1) min kAx − bk, A ∈ Rm×n, b ∈ Rm, m ≥ n, x∈Rn with a matrix A, whose singular values gradually decay to zero without a significant gap. In particular, A is very ill-conditioned and may be rank-deficient. To simplify the notation, we will assume that m ≥ n, but this restriction can be removed. Least-squares problems with a matrix of this kind are commonly referred to as linear discrete ill-posed problems. They arise, for instance, from the discretization of linear ill-posed problems, such as Fredholm integral equations of the first kind with a continuous kernel. The process of discretization is the transfer of continuous models and equations into discrete counterparts. It is used to derive an approximate problem with finitely many unknowns. The vector b in linear discrete ill-posed problems that arise in applications in science and engineering typically represents data that are contaminated by a measurement error e ∈ Rm. Sometimes we will refer to the vector e as “noise.” Thus,
(1.1.2) b = btrue + e,
m where btrue ∈ R represents the unknown error-free vector associated with the available vector b. We will assume that this “noise” vector e in (1.1.2) has normally distributed pseudorandom entries with mean zero and is normalized to correspond to a chosen noise level.
Let A† denote the Moore–Penrose pseudoinverse of A. We would like to determine an approximation of
1 † † † xtrue = A btrue by computing an approximate solution of (1.1.1). Note that the vector x = A b = xtrue + A e
† typically is a useless approximation of xtrue because the condition number of A, given by κ(A) = kAkkA k, is very large. Throughout this thesis k · k denotes the Euclidean vector norm or the spectral matrix norm.
† Generally, kA ek kxtruek, so the value of x can be very far from that of xtrue. Due to the ill-conditioning of A, our goal is to reformulate the problem so that the new solution is less sensitive to perturbations. That is, we regularize the problem so that solution becomes more stable.
1.2 Regularization methods
The severe ill-conditioning of A makes the naive solution very sensitive to any perturbation of b. This is handled by regularization, i.e. replacing the system (1.1.1) with a nearby system that is less sensitive to the error e in b. We are able to recover a meaningful solutions by imposing smoothness on the computed solution. Several regularization methods have been developed over the years and they are very effective when utilized to solve linear discrete ill-posed problems. We will use two of the most common methods: truncated iterations (specifically the truncated singular value decomposition and the truncated eigenvalue decomposition) and Tikhonov regularization.
1.2.1 Truncated singular value decomposition (TSVD)
Suppose A ∈ Rm×n is the matrix in (1.1.1), then the singular value decomposition (SVD) is the factoriza- tion,
(1.2.1) A = Uˆ ΣˆVˆ T
m×m n×n where Uˆ = [uˆ1,···,uˆm] ∈ R and Vˆ = [vˆ1,···,vˆn] ∈ R are orthogonal matrices; and Σˆ = diag[σˆ1,σˆ2,
m×n ...,σˆn] ∈ R , σˆ1 ≥ σˆ2 ≥ ··· ≥ σˆr > σˆr+1 = 0 = ··· = σˆn = 0, where r = rank(A). We call σˆi the singular values, while uˆi and vˆi are the left and right singular vectors respectively. Problems whose singular values decay quickly are referred to as severely ill-posed.
2 The truncated SVD (TSVD) regularization method solves (1.1.1) by replacing A with the closest rank-k approximation Ak to A.
k (1.2.2) Ak = ∑ uˆiσˆivˆi, k ≤ r = rank(A). i=1
† ˆ † T We express the Moore-Penrose pseudoinverse of Ak, as Ak = Vˆ ΣkUˆ or
k −1 T (1.2.3) ∑ vˆiσˆi uˆi i=1
When A in (1.1.1) is replaced by Ak, we obtain a new least squares problem:
min kAkx − bk. x∈Rn
† The solution to this problem is given by xk = Akb, and we can express it as
k T uˆi b (1.2.4) xk = ∑ vˆi, k ≤ r. i=1 σˆi
This is referred to as the truncated SVD (TSVD) solution. The truncation parameter, k, in (1.2.4) is deter-
T mined by the index i, where the coefficients |uˆi b|, begin to level off due to the noise. Since the singular values gradually decay to zero, the small singular values lead to difficulties. Several of the smallest, nonva- nishing singular values in problems of interest to us are tiny. The TSVD reduces the influence of the noise by omitting the right singular vectors corresponding to these tiny singular values. The computed examples reported in this work will use the discrepancy principle to determine the regularization parameter, k. This will be discussed in Section 1.2.4.
3 1.2.2 Truncated eigenvalue decomposition (TEVD)
When the matrix A ∈ Rn×n is symmetric, it suffices to compute a few of its eigenvalues of largest magnitude and associated eigenvectors. We refer to pairs consisting of the eigenvalues of largest magnitude and asso- ciated eigenvectors of A as eigenpairs of largest magnitude of A. In these situations, the TSVD simplifies to the truncated eigenvalue decomposition.
We introduce the eigenvalue (or spectral) decomposition
(1.2.5) A = W ΛW T ,
n×n where the matrix W = [w1,w2,...,wn] ∈ R has orthonormal columns, and
n×n Λ = diag[λ1,λ2,...,λn] ∈ R .
The eigenvalues λi are assumed to be ordered according to
(1.2.6) |λ1| ≥ |λ2| ≥ ... ≥ |λn|.
Thus, the magnitude of the eigenvalues are the singular values of A, and the columns of the matrix W, with appropriate sign, are the associated right and left singular vectors.
We define the truncated eigenvalue decomposition (TEVD)
T (1.2.7) Ak = WkΛkWk ,
n×k where Wk = [w1,w2,...,wk] ∈ R and
k×k Λk = diag[λ1,λ2,...,λk] ∈ R
4 for some 1 ≤ k ≤ n. Thus, Ak is the best rank-k approximation of A in the spectral norm.
Replacing A by Ak in (1.1.1) for a suitable (small) value of k, and solving the reduced problem so obtained,
† often gives a better approximation of xtrue than A b. Thus, substituting (1.2.7) into (1.1.1) and replacing b
T T by WkWk b (i.e., by the orthogonal projection of b onto the range of Wk) and setting y = Wk x, yields the minimization problem
T min kΛky −Wk bk. y∈Rk
−1 T Assuming that λk > 0, its solution is given by yk = Λk Wk b, which yields the approximate solution
† −1 T xk = Akb = WkΛk Wk b = Wkyk of (1.1.1). This approach of computing an approximate solution of (1.1.1) is known as the TEVD method.
It is analogous to the TSVD method for nonsymmetric problems.
1.2.3 Tikhonov regularization
A widely used method for solving discrete ill-posed problems is the regularization method due to Tikhonov
[69]. The general form solves (1.1.1) by replacing it with a penalized least squares problem
(1.2.8) minkAx − bk2 + λkLxk2 , x where A ∈ Rm×n and L ∈ Rp×n, satisfying m ≥ n ≥ p ≥ 1. The matrix L is called the regularization ma- trix. kAx − bk measures goodness-of-fit, as its size determines how the regularized solution fits the initial problem. The quantity kLxk measures the regularity of the solution. We assume that L is such that
N (A) ∩ N (L) = {0}.
λ ≥ 0 is called the regularization parameter and it determines how sensitive the solution of the regularized
5 system (1.2.8) is to the error, e.
The Tikhonov problem has two alternative formulations, with the normal equation
(1.2.9) (AT A + λLT L)x = AT b
and how to solve it stably
A b (1.2.10) min √ x − . x λL 0
The Tikhonov minimization problem (1.2.9) is said to be in general form. When the regularization matrix,
L = In, then it is in standard form
T T (1.2.11) (A A + λIn)x = A b.
The Tikhonov solution in standard form is then given by
2 2 (1.2.12) xλ = min kAx − bk + λkxk . x
We can rewrite the regularized Tikhonov solution xλ in terms of the SVD of A. We do so by substituting the
T SVD into (1.2.11) and using In = VˆVˆ to obtain
2 T T (Σˆ + λIn)Vˆ xλ = ΣˆUˆ b.
2 Given that Σˆ + λIn is nonsingular, then
2 −1 T (1.2.13) xλ = Vˆ (Σˆ + λIn) ΣˆUˆ b
6 or
n T [λ ] uˆi b (1.2.14) xλ = ∑ φi vˆi, i=1 σˆi
2 [λ ] σˆi where φi = 2 is the standard form Tikhonov filter factor. This function dampens components in the σˆi +λ solution that correspond to the small singular values.
1.2.4 Regularization parameter: the discrepancy principle
We will now address how to find a dependable and automated method for choosing the regularization param- eter, such as k (for truncated iterations) or λ (for Tikhonov regularization). There are several techniques for choosing this parameter and they include: the discrepancy principle, generalized cross validation (GCV), the
L-curve criterion, and the normalized cumulative periodogram (NCP) method; see [23, 25, 26, 40, 45, 61] for discussions of these and other methods for choosing an appropriate regularization parameter. The reg- ularization parameter used throughout this work is the discrepancy principle, which was first discussed by
Morozov in [54]. It requires that a bound for the error e in b be known a priori
kek ≤ ε.
We will apply the discrepancy principle regularization method as follows:
Truncated iterations
For the TSVD, we find the smallest integer k ≥ 0 such that
(1.2.15) kAxk − bk ≤ τε,
where τ ≥ 1 is a user-chosen constant independent of ε. The more accurate the estimate of our available error bound, the closer we can choose τ to 1. Ideally, we would like to choose k such that kAxk − bk = τε
7 but this is rarely satisfied in practice.
The same approach is true for TEVD. We remark that one can compute kAxk − bk without evaluating a matrix-vector product with A by observing that
T kAxk − bk = kb −WkWk bk.
It can be shown that xk → xtrue as kek → 0; see, e.g., [23] for a proof in a Hilbert space setting. The proof is for the situation when A is nonsymmetric, i.e., for the truncated singular value decomposition.
Tikhonov regularization
Given that we have a bound ε for the norm of the error vector e, we seek λ so that the residual norm is equal to this value. To accomplish this, we solve the following nonlinear equation in terms of λ,
2 2 2 kAxλ − bk = τ kek
by Newton’s method for instance.
1.3 Krylov subspace methods
Linear discrete ill-posed problems like (1.1.1) are commonly solved with the aid of the singular value de- composition (SVD) of A, if it is a small matrix; see, e.g., [40, 56] and references therein. However, it is expensive to compute the SVD of a general large matrix; the computation of the SVD of an n × n matrix requires about 22n3 arithmetic floating-point operations (flops). See, e.g., [35, Chapter 8] for details as well as for flop counts for the situation when m > n. In particular, the SVD of a large general m × n matrix is very expensive to compute. Therefore, large-scale linear discrete ill-posed problems (1.1.1) are sometimes solved by hybrid methods that first reduce a large least-squares problem to a least-squares problem of small size by a Krylov subspace method, and then solve the latter by using the SVD of the reduced matrix so
8 obtained.
Given the matrix A ∈ Rn×n and the vector b ∈ Rn, the Krylov subspace generated by A and b is defined by
2 `−1 (1.3.1) K`(A,b) = span{b,Ab,A b,...,A b}, ` ≥ 1.
A Krylov method seeks an approximate solution to (1.1.1) in the space (1.3.1). Krylov subspace methods deal with matrix-vector products of A and not directly with A. As a result, they are very effective when A is very large and sparse. To construct a Krylov sequence, begin with the initial vector, b. We then multiply by A to get the next vector, Ab. This is followed by multiplying that vector by A to get the next vector, A2b, and so on. Hence, the matrix A2 is not explicitly formed, but the matrix-vector product A2b is evaluated as A(Ab), etc. These vectors are not orthogonal and for relatively small values of ` may become nearly linearly de- pendent. We would like to determine an orthonormal basis for a Krylov subspace, as orthonormal bases are easiest to work with. A few well-known Krylov subspace methods generate orthonormal bases. These meth- ods include the Arnoldi method, the Lanczos method and the Golub–Kahan decomposition method.
1.3.1 The Arnoldi method
The Arnoldi method [2] is a widely used Krylov subspace method. It builds an orthonormal basis of the
Krylov subspace K`+1(A,b) for general square, non-symmetric matrices. It is summarized in Algorithm 1.
Application of ` steps of Algorithm 1 yields the Arnoldi decomposition
(1.3.2) AV` = V`+1H`+1,`
n×(`+1) where the matrix V`+1 = [v1,v2,···,v`+1] ∈ R has orthonormal columns such that v1 = b/kbk, and span{v1,v2,···,v`+1} = K`+1(A,b). Also note that V` consists of the first ` columns
9 Algorithm 1 The Arnoldi Process 1: Input: A, b 6= 0, ` 2: Initialize: v1 = b/kbk 3: for j = 1,2,...,` do 4: w = Av j 5: for i = 1,···, j do 6: hi, j = hw,vii 7: w = w − hi, jvi 8: end for 9: h j+1, j = kwk 10: if h j+1, j = 0 then Stop
11: v j+1 = w/h j+1, j 12: end for 13: end
of V`+1. Furthermore, the matrix H`+1,` is an upper Hessenberg matrix
h1,1 ··· h1,` h h 2,1 2,2 . . . (`+1)×` (1.3.3) H`+1,` = .. .. . ∈ R . h`,`−1 h`,` h`+1,`
We denote the leading ` × ` submatrix of H`+1,` by H` . The vector w in Algorithm 1 is obtained by multiplying the previous Arnoldi vector, v j, by A. It is then orthonormalized against all previous Arnoldi vectors, vi, by the modified Gram-Schmidt iteration. The algorithm terminates in line 10, when h j+1, j = 0.
This situation implies that w ∈ span{v1,v2,···,v j}. As such, span{v1,v2,···,v j} is an invariant subspace of
A, simplifying (1.3.2) to AVj = VjHj.
10 The eigenvalues of Hj, where
T j× j (1.3.4) Hj = Vj AVj ∈ R
are called the Ritz values for A. The Ritz values are typically good approximations to the extreme eigenvalues of A, especially when A is symmetric.
1.3.2 The symmetric Lanczos process
The symmetric Lanczos process [47] is a special case of the Arnoldi process when A is symmetric. It has some very nice properties that include significant computational savings. When A is real and symmetric,
T T T then H` = (V` AV`) = H` is also symmetric, hence it is tridiagonal. Since hi, j = 0 for i < j − 1, we then introduce a new notation for the tridiagonal matrix:
α j = h j, j and β j = h j+1, j = h j, j+1, j = 1,···,`.
Then (1.3.3) becomes the tridiagonal matrix
α1 β2 β α β 2 2 3 .. β3 α3 . . . (`+1)×` (1.3.5) T`+1,` = .. .. ∈ R . β`−1 β α β `−1 `−1 ` β` α` β`+1
Consequently, Algorithm 1 take the form:
11 Algorithm 2 The Symmetric Lanczos Process 1: Input: A, b 6= 0, ` 2: Initialize: v1 = b/kbk, β1 = 0, v0 = 0 3: for j = 1,2,...,` do 4: y = Av j − β jv j−1 5: α j = hy,v ji 6: y = y − α jv j 7: β j+1 = kyk 8: if β j+1 = 0 then Stop
9: v j+1 = y/β j+1 10: end
The Lanczos vectors, v j, generated by the algorithm are orthonormal; and we define the matrix V`+1 = [v1,
n×(`+1) v2,...,v`+1] ∈ R . A matrix interpretation of the recursion relations of Algorithm 2 gives the (partial)
Lanczos decomposition
(1.3.6) AV` = V`+1T`+1,`.
It follows from the recursions of the Lanczos method that the columns v j of V`+1 can be expressed as
v j = q j−1(A)b, j = 1,2,...,` + 1
` where q j−1 is a polynomial of exact degree j − 1. Consequently, {v j} j=1 is an orthonormal basis for the
Krylov subspace (1.3.1). In exact arithmetic, the v j’s are orthogonal. However, upon implementation, they quickly lose their orthogonality. We use reorthogonalization schemes in our computed examples to circumvent this issue.
We finally comment on the situation when Algorithm 2 breaks down. This happens when some coefficient
β j+1 vanishes. We then have determined an invariant subspace of A. If this subspace contains all the desired eigenvectors, then we compute an approximation of xtrue in this subspace. Otherwise, we restart Algorithm
2 with an initial vector that is orthogonal to the invariant subspace already found. Since the occurrence of breakdown is rare, we will not dwell on this situation. See, e.g., Saad [64] for a thorough discussion on the
12 properties of Algorithm 2.
1.3.3 Golub-Kahan bidiagonalization
A large nonsymmetric matrix A ∈ Rm×n can be reduced to a small bidiagonal matrix by applying a few steps of the Golub–Kahan bidiagonalization (also known as the Lanczos bidiagonalization algorithm). This is described by Algorithm 3.
Algorithm 3 Golub–Kahan Bidiagonalization 1: Input: A, b 6= 0, ` T 2: Initialize: β1 = kbk, p1 = b/β1, q = A p1, α1 = kqk, q1 = q/α1 3: for j = 2,...,` + 1 do 4: p = Aq j−1 − α j−1 p j−1 5: β j = kpk 6: if β j = 0 then Stop
7: p j = p/β j T 8: q = A p j − β jq j−1 9: α j = kqk 10: if α j = 0 then Stop
11: q j = q/α j 12: end
Using the vectors p j and q j determined by Algorithm 3, we define the matrices P`+1 = [p1,..., p`, p`+1] ∈
m×(`+1) n×(`+1) R and Q`+1 = [q1,...,q`,q`+1] ∈ R with orthonormal columns and P` consists of the first `
13 T T columns of P`+1. These vectors form orthonormal bases for the Krylov subspaces K`(AA ,b) and K`(A A,
T A b), respectively. The scalars α j and β j computed by the algorithm define the lower bidiagonal ma- trix
α1 β α 2 2 β3 α3 . . (`+1)×` (1.3.7) C¯` = .. .. ∈ R . β α `−1 `−1 β` α` β`+1
A matrix interpretation of the recursions of Algorithm 3 gives the Golub–Kahan decompositions
T T (1.3.8) AQ` = P`+1C¯`, A P` = Q`C` ,
where the leading ` × ` submatrix of C¯` is denoted by C`. We assume ` is chosen small enough so that the decompositions (1.3.8) with the stated properties exist. See [12] for a recent discussion of this decomposi- tion.
If we combine the Golub–Kahan decompositions (1.3.8), we get
T T (1.3.9) A AQ` = Q`+1C¯` C¯`
T where C¯` C¯` is a symmetric tridiagonal matrix. Observe that this decomposition is equivalent to applying the Lanczos process (1.3.6) to the symmetric positive semidefinite matrix AT A.
14 1.3.4 Block Krylov methods
There are situations where (1.1.1) is a least squares problem with the vector b replaced by a matrix B ∈
Rm×s, 1 ≤ s m, see e.g. in [1, 19, 22, 43, 51, 63]. In some of these cases, it is beneficial to use block generalizations of Krylov subspace methods. Then the matrix, A ∈ Rn×n, operates on a group of vectors instead of a single vector. We proceed by discussing an extension of the Arnoldi algorithm – the block
Arnoldi algorithm. It is described in Algorithm 4, an adaptation of Algorithm 1.
Algorithm 4 Block Arnoldi Algorithm 1: Input: A ∈ Rn×n, B ∈ Rn×s, `, and the block size s, where 1 ≤ s n 2: Compute the QR decomposition B = V1H1,1 3: for j = 1,2,...,` do 4: Wj = AVj 5: for i = 1,···, j do T 6: Hi, j = Vi Wj 7: Wj = Wj −ViHi, j 8: end for 9: Compute the QR decomposition Wj = Vj+1Hj+1, j 10: end for 11: end
n×s 0 The blocks, Vi ∈ R , have orthonormal columns. Furthermore, the Vi s are mutually orthogonal and they
n×`s form the matrix V¯` = [V1,···,V`] ∈ R . This is an orthogonal basis for the block Krylov subspace
2 `−1 (1.3.10) K`(A,V1) = span{V1,AV1,A V1,...,A V1}, ` ≥ 1.
Consequently, a matrix interpretation of the recursion relations of Algorithm 4 gives the block Arnoldi decomposition
T AV¯` = V¯`H¯` +V`+1H`+1,`E` ,
`s×`s where H¯` ∈ R is no longer upper Hessenberg, but block upper Hessenberg (an upper triangular matrix
s×s with s subdiagonals); whose nonzero block entries are the upper triangular blocks Hi, j ∈ R for 1 ≤ i, j ≤ `
`s×`s and Hi, j ≡ 0 when i > j + 1. E` is a matrix of the last s columns of the identity matrix, I`s ∈ R .
15 1.4 The test problems
Most MATLAB codes for determining the discrete ill-posed problems in the computed examples of this
thesis stem from Regularization Tools by Hansen [41]. These linear systems were obtained by discretizing
Fredholm integral equations of the first kind. We assume that the system matrix A ∈ Rn×n as well as the exact
n solution xtrue ∈ R are available. If not accessible, the discrete right-hand side is obtained by computing
btrue = Axtrue.
1.4.1 Descriptions of the test problems
baart: The Fredholm integral equation of the first kind
Z π sin(s) π (1.4.1) exp(scos(t))x(t)dt = 2 , 0 ≤ s ≤ , 0 s 2
is discussed by Baart [3]. It has the solution x(t) = sin(t). The integral equation is discretized by a
Galerkin method with piece-wise constant test and trial functions using the function baart from [41].
This gives a nonsymmetric matrix. deriv2: The Fredholm integral equation of the first kind
Z 1 (1.4.2) K(s,t)x(t)dt = g(s), 0 ≤ s,t ≤ 1, 0
where the kernel K is Green’s function for the second derivative
s(t − 1), s < t, K(s,t) = t(s − 1), s ≥ t.
The right-hand side is given by g(s) = (s3 −s)/6 and the solution is x(t) = t. The integral equation is
discretized by a Galerkin method using the MATLAB function baart from [41]. The matrix produced
16 is symmetric and negative definite. This problem is mildly ill-conditioned, i.e., its singular values
decay slowly to zero. foxgood: This is the Fredholm integral equation of the first kind
Z 1 1 3 2 2 2 1 2 3 (1.4.3) s +t x(t)dt = (1 + s ) 2 − s , 0 ≤ s,t ≤ 1, 0 3
with solution x(t) = t, originally discussed by Fox and Goodwin [27]. The function foxgood from [41]
is used to determine a discretization by a Nyström method. This gives a symmetric indefinite matrix
that is severely ill-posed and numerically singular.
gravity: A one-dimensional gravity surveying model problem resulting in the first-kind Fredholm integral
equation
− 3 Z 1 1 1 2 (1.4.4) + (s −t)2 x(t)dt = g(s), 0 ≤ s,t ≤ 1, 0 4 16
and 1 x(t) = sin(πt) + sin(2πt). 2
Discretization is carried out by a Nyström method based on the midpoint quadrature rule using the
function gravity from [41]. The resulting matrix is symmetric positive definite and the exact right-
hand side is computed as btrue = Axtrue.
heat: The inverse heat equation [18] used in this thesis is a Volterra integral equation of the first kind. The
kernel is given by K(s,t) = k(s −t), where
1 1 k(t) = √ exp − . 2t3/2 π 4t
The discretization of the integral equation is done by simple collocation and the midpoint rule with n
points. The matrix produced is a lower-triangular matrix and it is ill-conditioned. An exact solution
17 is constructed and then the discrete right-hand side is computed as btrue = Axtrue. This is a severely
ill-posed problem. i_laplace: The Fredholm integral equation of the first kind
Z ∞ 16 (1.4.5) exp(−st)x(t)dt = 3 , s ≥ 0, t ≥ 0, 0 (2s + 1)
2 t represents the inverse Laplace transform, with the solution x(t) = t exp − 2 . It is discretized by
means of Gauss–Laguerre quadrature using the MATLAB function i_laplace from [41]. The non-
symmetric matrix so obtained is numerically singular.
phillips: We now consider the Fredholm integral equation of the first kind discussed by Phillips [60],
Z 6 (1.4.6) K(s,t)x(t)dt = g(t), −6 ≤ s,t ≤ 6, −6
where the solution x(t), kernel K(s,t), and right-hand side g(s) are given by
1 + cos πt , |t| < 3, 3 x(t) = 0, |t| ≥ 3, K(s,t) = x(s −t), 1 πs 9 π|s| g(s) = (6 − |s|) 1 + cos + sin . 2 3 2π 3
The integral equation is discretized by a Galerkin method using the MATLAB function phillips from
[41]. The matrix produced is symmetric and indefinite.
shaw: Fredholm integral equation of the first kind discussed by Shaw [66],
π Z 2 π π (1.4.7) K(s,t)x(t)dt = g(s), − ≤ s,t ≤ , π − 2 2 2
18 with kernel sin(π(sin(s) + sin(t)))2 K(s,t) = (cos(s) + cos(t))2 π(sin(s) + sin(t)) and solution
x(t) = 2exp(−6(t − 0.8)2) + exp(−2(t + 0.5)2), which define the right-hand side function g. Discretization is carried out by a Nyström method based on the midpoint quadrature rule using the function shaw from [41]. The resulting matrix is symmetric indefinite and numerically singular. The discrete right-hand side is computed as btrue = Axtrue. This problem is severely ill-posed.
19 CHAPTER 2
Reduction methods applied to discrete ill-posed problems
2.1 Introduction
Consider (1.1.1) with a large symmetric matrix, A ∈ Rn×n. Many solution methods for such large-scale problems first reduce the system of equations to a problem of small size. The symmetric Lanczos method discussed in Section 1.3.2 is a popular reduction method. Application of ` steps of Algorithm 2 to A with initial vector b yields a decomposition of the form (1.3.6).
The Lanczos method determines the diagonal and subdiagonal elements
α1,β2,α2,β3,...,α`,β`+1
of T`+1,` in order. Generically, the subdiagonal entries β j are positive, and then the decomposition (1.3.6) with the stated properties exists.
The solution of (1.1.1) by truncated iteration proceeds by solving
(2.1.1) min kAx − bk = min kT`+1,`y − e1kbkk, ` x∈K`(A,b) y∈R where the right-hand side is obtained by substituting the decomposition (1.3.6) into the left-hand side and by exploiting the properties of the matrices involved. Here and in the following, e j denotes the jth axis vector. Let y` be a solution of the least squares problem on the right-hand side of (2.1.1). Then x` := V`y` is a solution of the constrained least squares problem on the left-hand side of (2.1.1), as well as an approximate solution of (1.1.1). By choosing ` suitably small, propagation of the error e in b into the computed solution
† x` is reduced. This depends on that the condition number of T`+1,`, given by κ(T`+1,`) := kT`+1,`kkT`+1,`k,
20 is an increasing function of `. A large condition number indicates that the solution y` of the right-hand side of (2.1.1) is sensitive to errors in the data and to round-off errors introduced during the computations. We will discuss this and other solution methods below. For overviews and analyses of solution methods for linear discrete ill-posed problems, we refer to [23, 40].
We will investigate the structure of the matrix (1.3.5) obtained by applying the Lanczos method to a sym- metric matrix whose eigenvalues “cluster” at the origin. We will give upper bounds for the size of the subdiagonal entries. These bounds shed light on the solution subspaces generated by the symmetric Lanc- zos method. In particular, the bounds indicate that the ranges of the matrices V` essentially contain the span of the k = k(`) eigenvectors of A associated with the k eigenvalues of largest magnitude where k(`) is an increasing function of ` and, generally, k(`) < `. This observation suggests that it may not be necessary to compute a partial eigendecomposition of A, but that it suffices to determine a few Lanczos vectors, which is much cheaper. We also will investigate the solution subspaces determined by application of ` steps of
Golub–Kahan bidiagonalization to a nonsymmetric matrix A, whose singular values cluster at the origin.
We find the solution subspaces determined by ` steps of Golub–Kahan bidiagonalization applied to A to essentially contain the spans of the k = k(`) right and left singular vectors of A associated with the k largest singular values, where k = k(`) is an increasing function of ` and, generally, k(`) < `. This suggests that it may not be necessary to compute singular value or partial singular value decompositions of A, but that it suf-
fices to carry out a few steps of Golub–Kahan bidiagonalization, which is much cheaper. The results for the spans of the solution subspaces determined by partial Golub–Kahan bidiagonalization follow from bounds for singular values. These bounds provide an alternative to the bounds shown by Gazzola et al. [30, 31].
Related bounds also are presented by Novati and Russo [56].
2.2 Application of the symmetric Lanczos method
This section discusses the convergence of the subdiagonal and diagonal entries of the matrix T`+1,` in (1.3.6) with increasing dimensions. The proofs use the spectral factorization (1.2.5).
21 Theorem 2.2.1. Let the matrix A ∈ Rn×n be symmetric and positive semidefinite, and let its eigenvalues be ordered according to (1.2.6). Assume that the Lanczos method applied to A with initial vector b does not break down, i.e., that n steps of the method can be carried out. Let β2,β3,...,β`+1 be the subdiagonal entries of the matrix T`+1,` determined by ` steps of the Lanczos method; cf. (1.3.6). Define βn+1 := 0. Then
`+1 ` (2.2.1) ∏ β j ≤ ∏ λ j, ` = 1,2,...,n. j=2 j=1
` Proof. Introduce the monic polynomial p`(t) = ∏ j=1(t − λ j) defined by the ` largest eigenvalues of A.
Using the spectral factorization (1.2.5), we obtain
` kp`(A)k = kp`(Λ)k = max |p`(λ j)| ≤ |p`(0)| = ∏ λ j, `+1≤ j≤n j=1
where the inequality follows from the fact that all λ j are nonnegative. Hence,
` (2.2.2) kp`(A)bk ≤ kbk ∏ λ j. j=1
Application of n steps of the symmetric Lanczos method gives the decomposition AVn = VnTn, where Tn ∈
n×n n×n R is symmetric and tridiagonal, and Vn ∈ R is orthogonal with Vne1 = b/kbk. We have
T (2.2.3) p`(A)b = Vn p`(Tn)Vn b = Vn p`(Tn)e1kbk.
This relation gives the equality below,
`+1 (2.2.4) kp`(A)bk = kp`(Tn)e1kkbk ≥ kbk ∏ β j. j=2
22 The inequality above follows by direct computation. Specifically, one can show by induction on ` that
`+1 T kp`(Tn)e1k ≥ |e`+1 p`(Tn)e1| = ∏ β j. j=2
For ` = 1 the result is trivial. Assume that it is valid for 1 ≤ ` < n. Then
T T T e`+2 p`+1(Tn)e1 = e`+2(Tn − λ`+1I)p`(Tn)e1 = φ`+2 p`(Tn)e1, with
φ`+2 = β`+2e`+1 + (α`+2 − λ`+1)e`+2 + β`+3e`+3.
Since p`(Tn) is (2` + 1)-banded, we obtain
`+1 `+2 T T φ`+2 p`(Tn)e1 = β`+2e`+1 p`(Tn)e1 = β`+2 ∏ β j = ∏ β j. j=2 j=2
Combining (2.2.2) and (2.2.4) shows the theorem.
In practice, the bound (2.2.1) is often quite sharp; we will give a numerical illustration of this in Section 2.4.
Moreover, we can easily derive bounds of the form
j−1 ∏i=1 λi (2.2.5) β j+1 ≤ k jλ j, j = 1,...,n − 1, with k j := j−1 ≥ 1. ∏i=1 βi+1
However, if the bound (2.2.1) is not sharp, then k j 1, resulting in a meaningless estimate (2.2.5). A result analogous to Theorem 2.2.1 for nonsymmetric matrices A, with the Lanczos method replaced by the Arnoldi method, has been shown by Novati and Russo [56] and Gazzola et al. [30, 31]. It should be emphasized that the bounds proved in [30, 31, 56] are similar to (2.2.5), but assume btrue as starting vector for the Arnoldi algorithm, involve constants whose values are not explicitly known, and are only valid for moderately to severely ill-posed problems; see, e.g., Hansen [40] for this classification of ill-posed
23 problems. The restriction to symmetric matrices allows us to give a bound which does not explicitly depend on the starting vector of the Lanczos method.
Corollary 2.2.2. Let A ∈ Rn×n be symmetric and positive semidefinite. Assume that the eigenvalues of A
“cluster” at the origin and that the Lanczos method applied to A with initial vector b does not break down.
Also assume that for all j > s, β j ≤ C min βi, for a constant C independent of j and s. Then both the 1≤i≤s diagonal and subdiagonal entries α j and β j of the tridiagonal Lanczos matrix T`+1,`, cf. (1.3.6), approach zero when j increases and is large enough.
Proof. We first remark that when we let the index j increase, we also may have to increase ` in (1.3.6). The fact that the β j’s approach zero as j increases follows from (2.2.1) and the clustering of the eigenvalues at zero. We turn to the diagonal entries α j of the tridiagonal Lanczos matrix Tn. This matrix is similar to A.
Therefore its eigenvalues cluster at the origin, which is the only cluster point. Application of Gershgorin disks to the rows of Tn, using the fact that the off-diagonal entries are “tiny” for large row numbers, shows that the corresponding diagonal entries are accurate approximations of the eigenvalues of Tn. These entries therefore have to approach zero as j increases.
We remark that the decrease of the subdiagonal entries β j of T`+1,` to zero with increasing j follows from the clustering of the eigenvalues of A; it is not necessary that they cluster at the origin. This can be seen by replacing the matrix A in Corollary 2.2.2 by A + cIn for some constant c ∈ R. This substitution adds c to the diagonal entries of T`+1,` in (1.3.6), and it is employed with a suitable parameter c > 0 to perform
Lavrentiev-type regularization; cf. [40].
The assumption in Theorem 2.2.1 that n steps of the Lanczos method can be carried out simplifies the proof, but is not essential. We state the corresponding result when the Lanczos method breaks down at step k < n.
Corollary 2.2.3. Let the matrix A ∈ Rn×n be symmetric and positive semidefinite, and let its eigenvalues be ordered according to (1.2.6). Assume that the Lanczos method applied to A with initial vector b breaks
24 down at step k. Then `+1 ` ∏ β j ≤ ∏ λ j, ` = 1,2,...,k, j=2 j=1 where βk+1 := 0.
Proof. Application of k steps of the Lanczos method to the matrix A with initial vector b gives the decom- position
AVk = VkTk,
n×k where Vk = [v1,v2,...,vk] ∈ R has orthonormal columns with Vke1 = b/kbk and
α1 β2 β2 α2 β3 .. β3 α3 . k×k Tk = ∈ R .. .. . . βk−1 βk−1 αk−1 βk βk αk is symmetric and tridiagonal with positive subdiagonal entries. The inequality (2.2.2) holds for ` = 1,2,..., n, however, the relation (2.2.3) has to be replaced by
(2.2.6) p`(A)b = Vk p`(Tk)e1kbk, 1 ≤ ` < k.
This relation can be shown by induction on `. Indeed, for ` = 1, one immediately has
p1(A)b = (A − λ1I)Vke1kbk = (AVk − λ1Vk)e1kbk = Vk(Tk − λ1I)e1kbk = Vk p1(Tk)e1kbk.
25 Assuming that (2.2.6) holds for ` < k − 1, for ` + 1 one gets
p`+1(A)b = (A − λ`+1I)p`(A)b = (A − λ`+1I)Vk p`(Tk)e1kbk
= Vk(Tk − λ`+1I)p`(Tk)e1kbk = Vk p`+1(Tk)e1kbk.
Analogously to (2.2.4), we obtain
`+1 kp`(A)bk = kp`(Tk)e1kkbk ≥ kbk ∏ β j, j=2 and the corollary follows.
We turn to symmetric indefinite matrices. For notational simplicity, we will assume that the Lanczos method does not break down, but this requirement can be relaxed similarly as in Corollary 2.2.3.
n n×n Theorem 2.2.4. Let the eigenvalues {λ j} j=1 of the symmetric matrix A ∈ R be ordered according to
(1.2.6). Assume that the Lanczos method applied to A with initial vector b does not break down. Then
`+1 ` (2.2.7) ∏ β j ≤ ∏(|λ`+1| + |λ j|), ` = 1,2,...,n − 1. j=2 j=1
Proof. Let p`(t) be the monic polynomial of the proof of Theorem 2.2.1. Then just like in that proof
kp`(A)k = kp`(Λ)k = max |p`(λ j)|. `+1≤ j≤n
It follows from the ordering (1.2.6) of the eigenvalues that the interval [−|λ`+1|,|λ`+1|] contains all the eigenvalues λ`+1,λ`+2,...,λn. Therefore,
` max |p`(λ j)| ≤ max |p`(t)| ≤ ∏(|λ`+1| + |λ j|). `+1≤ j≤n −|λ`+1|≤t≤|λ`+1| j=1
The inequality (2.2.7) now follows similarly as the proof of the analogous inequality (2.2.1).
26 Assume that the eigenvalues of A cluster at the origin. Then Theorem 2.2.4 shows that the factors |λ`+1| +
|λ j| decrease to zero as ` and j, with 1 ≤ j ≤ `, increase. Furthermore, the more Lanczos steps are taken, the tighter the bound for the product of the subdiagonal elements of the matrix T`+1,`. Sharper bounds for the product of subdiagonal entries of T`+1,` can be obtained if more information about the spectrum of A is available. For instance, if all but a few eigenvalues of A are known to be nonnegative, then only the factors with the negative eigenvalues have to be modified as in Theorem 2.2.4, resulting in improved bounds for products of the β j. Simpler, but cruder, bounds than (2.2.7) also can be derived. The following is an example.
n n×n Corollary 2.2.5. Let the eigenvalues {λ j} j=1 of the symmetric matrix A ∈ R be ordered according to
(1.2.6). Assume that the Lanczos method applied to A with initial vector b does not break down. Then,
`+1 ` (2.2.8) ∏ β j ≤ ∏(2|λk|), ` = 1,2,...,n − 1. j=2 k=1
Proof. The result follows from the observation that |λ`+1| ≤ |λk| for 1 ≤ k ≤ `.
Introduce the set of ε-pseudoeigenvectors of A ∈ Rn×n:
n (2.2.9) Vε := {x ∈ R unit vector : ∃λ ∈ R such that kAx − λxk ≤ ε}.
The λ-values associated with ε-pseudoeigenvectors are ε-pseudoeigenvalues of A; see, e.g., Trefethen and
Embree [72] for an insightful treatment of pseudospectra of matrices and operators.
T Substituting the decomposition A = VnTnVn into (2.2.9) and applying Theorem 2.2.4 show that, for a given
ε > 0 and for j sufficiently large, the Lanczos vectors v j are ε-pseudoeigenvectors of A associated with eigenvalues close to zero. Indeed, by (1.3.6) we get
Av j = AV`e j = V`+1T`+1,`e j = α jv j + β jv j−1 + β j+1v j+1 .
27 Since as j increases α j and β j approach 0, we can conclude that the v j are ε-pseudoeigenvectors for j large.
Let w j denote the jth column of the matrix W in (1.2.5), i.e., let w j be the jth eigenvector of A. Therefore,
k ` the space span{w j} j=1 is essentially contained in span{v j} j=1 for k = k(`) ≤ ` sufficiently small. The notion of “essentially contained” will be made precise and illustrated in Section 2.4.
k ` The above observation about the subspaces span{w j} j=1 and span{v j} j=1 for k = k(`) has implications for computations. One of the most popular methods for solving linear discrete ill-posed problems is the trun- cated singular value decomposition (TSVD); see Section 1.2.1. The truncated eigenvalue decomposition for symmetric matrices is analogous to the TSVD; see Section 1.2.2. It is based on expressing an approximate
k solution of (1.1.1) as a linear combination of the first few eigenvectors, say {w j} j=1, of A; cf. (1.2.5). The
` computation of these eigenvectors is more expensive than the determination of the Lanczos vectors {v j} j=1 for a reasonable k = k(`) ≤ `, because typically several Lanczos decompositions with different initial vec- tors have to be computed in order to determine the desired eigenvectors; see, e.g., Baglama et al. [4] and
Saad [65] for discussions on methods for computing a few eigenpairs of a large matrix. Since the span of
` k the Lanczos vectors {v j} j=1 essentially contains the set of the eigenvectors {w j} j=1, there is generally no need to compute the latter. This is illustrated by numerical examples in Section 2.4.
It is sometimes beneficial to determine an approximate solution of (1.1.1) in a shifted Krylov subspace
2 ` (2.2.10) K`(A,Ab) = span{Ab,A b,...,A b}
instead of in the standard Krylov subspace (1.3.1). This is discussed and illustrated in [15, 20, 55]. Let
(2.2.11) AV˘` = V˘`+1T˘`+1,`,
n×(`+1) where V˘`+1 = [v˘1,v˘2,...,v˘`+1] ∈ R has orthonormal columns with v˘1 = Ab/kAbk, V˘` = [v˘1,v˘2,...,
n×` v˘`] ∈ R , and the tridiagonal matrix T˘`+1,` is of the same form as (1.3.5). Then, analogously to (2.1.1), we
28 formally obtain
2 T 2 T 2 min kAx − bk = min kT˘`+1,`y −V˘ bk + k(I −V˘`+1V˘ )bk . ` `+1 `+1 x∈K`(A,Ab) y∈R
` Let y˘` ∈ R denote the solution of the minimization problem on the right-hand side of the above relation.
Then x˘` := V˘`y˘` solves the constrained least squares problem on the left-hand side of the above equation and is an approximate solution of (1.1.1). Computed examples show the vector x˘` to generally approximate the desired solution xtrue more accurately than the solution x` of the minimization problem (2.1.1). In the computed examples of Section 2.4, we therefore compute the vectors x˘1,x˘2,... .
We note that the analysis in this section is independent of the initial vector b in the Krylov subspace (1.3.1), except for how this vector affects the occurrence of breakdown. In particular, our analysis carries over to shifted Krylov subspaces of the form (2.2.10).
2.3 Application of the Golub–Kahan reduction method
A nonsymmetric matrix A ∈ Rm×n can be reduced to a small bidiagonal matrix by a few steps of Golub–
Kahan bidiagonalization; see Section 1.3.3. This reduction method is the basis for the popular LSQR al- gorithm [59] for the solution of least-squares problems (1.1.1) where the vector b ∈ Rm can be written as
(1.1.2). Application of ` n steps of Golub–Kahan bidiagonalization to A with initial vector b gives the decompositions (1.3.8). Throughout this section, α j and β j refer to entries of the matrix (1.3.7).
The LSQR method applied to the solution of (1.1.1) solves in step ` the minimization problem
min kAx − bk = min kC¯`y − e1kbkk, T T ` x∈K`(A A,A b) y∈R where the right-hand side is obtained by substituting (1.3.8) into the left-hand side. Denote the solution of
0 0 0 the right-hand side by y`. Then the `th step of LSQR yields the solution xm = Q`ym of the left-hand side, which is an approximate solution of (1.1.1).
29 The decomposition (1.3.9) is a Lanczos decomposition, and it allows us to apply Theorem 2.2.1.
m×n Corollary 2.3.1. Let A ∈ R have the singular values σ1 ≥ σ2 ≥ ··· ≥ σn ≥ 0, and assume that the
Golub–Kahan bidiagonalization method applied to A with initial vector b does not break down. Then
`+1 ` 2 (2.3.1) ∏ α jβ j ≤ ∏ σ j , ` = 1,2,...,m − 1, j=2 j=1
where the α j and β j are entries of the bidiagonal matrix (1.3.7).
T T 2 Proof. The subdiagonal entries of the matrix C¯` C¯` in (1.3.9) are α jβ j and the eigenvalues of A A are σ j .
The result therefore follows from Theorem 2.2.1.
The above corollary shows that if the singular values σ j cluster at zero for large j, then so do the products
α jβ j of the entries of the matrix (1.3.7). Bounds related to (2.3.1) have been shown by Gazzola et al.
[31].
The computation of the SVD (1.2.1) is feasible for problems of small to moderate size, but expensive for large-scale problems. The computational expense for large-scale problems can be reduced by computing the the partial singular value decomposition of A, {Uˆk,Vˆk,Σˆ k}, instead of {Uˆ ,Vˆ ,Σˆ }; see [6, 8, 58] and references therein for suitable numerical methods. The computation of Uˆk, Vˆk, and Σˆ k generally requires that several
Golub–Kahan decompositions (1.3.8) with different initial vectors be evaluated.
Corollary 2.3.1 indicates why it may not be necessary to compute the matrices Uˆk, Vˆk, and Σˆ k. The columns
T 2 of the matrix Vˆ in (1.2.1) are eigenvectors of A A and the σˆ j are eigenvalues. It is a consequence of
Corollary 2.3.1, and the fact that the singular values σˆ j of A cluster at the origin, that the columns q j of
T the matrix Q` in (1.3.8) for small j are accurate approximations of eigenvectors of A A. This follows from an argument analogous to the discussion in Section 2.2 based on Corollaries 2.2.2 and 2.2.5. Therefore, it generally is not necessary to compute the partial singular value decomposition {Uˆk,Vˆk,Σˆ k} of A. Instead, it suffices to determine a partial Golub–Kahan bidiagonalization (1.3.8), which is cheaper. This is illustrated in the following section.
30 2.4 Computed examples
To investigate the properties discussed in the previous sections, we applied the symmetric Lanczos and
Golub–Kahan bidiagonalization methods to a set of test matrices whose singular values cluster at the origin.
The numerical experiments were carried out using MATLAB R2014a in double precision arithmetic, that is, with about 15 significant decimal digits.
The symmetric test matrices are listed in Table 1, while the nonsymmetric ones are in Table 2. Among the symmetric matrices, one example is negative definite (Deriv2), one is positive definite (Gravity), and the others are indefinite. All matrices except one are from the Regularization Tools package [41]. The
Lotkin test matrix was generated by the gallery function, which is available in the standard MATLAB distribution. All test matrices are of order 200 × 200 except when explicitly stated otherwise.
Figure 1: Behavior of the bounds (2.2.1) (left), (2.2.7) (center), and (2.3.1) (right), with respect to the iteration index `. The first test matrix is symmetric positive definite, the second is symmetric indefinite, and the third is unsymmetric. The left-hand side of each inequality is represented by crosses, the right-hand side by circles.
Figure 1 displays, in logarithmic scale, the values taken by each side of inequalities (2.2.1), (2.2.7), and
(2.3.1), with the number of iterations, `, ranging from 1 to the index corresponding either to a breakdown of the algorithm or to the last nonzero value of both inequality sides. The graphs show that the bounds provided by Theorems 2.2.1 and 2.2.4, and by Corollary 2.3.1, are quite sharp.
31 Now we illustrate that subspaces R(V˘k) generated by the Lanczos method (2.2.11) essentially contain sub- spaces of eigenvectors of A associated with the eigenvalues of largest magnitude. We also discuss the con- vergence of the largest eigenvalues of the matrices T˘k in (2.2.11) to eigenvalues of A of largest magnitude.
k×k (k+1)×k Here T˘k ∈ R is the matrix obtained by neglecting the last row of the matrix T˘k+1,k ∈ R defined by
(2.2.11) with ` replaced by k. The Lanczos method is applied for n steps or until breakdown occurs – that is,
−12 a subdiagonal element of T˘k is smaller than 10 ; ` denotes the number of steps performed by the method.
The initial column of the matrices V˘k is Abtrue/kAbtruek.
˘ (k) k ˘ (k) Let {λi }i=1 denote the eigenvalues of the matrix T˘k. We compare the eigenvalues λi of largest magnitude to the corresponding eigenvalues λi of the matrix A. All eigenvalues are ordered according to decreasing magnitude. For each Lanczos step k, we compute the relative difference
˘ (k) |λi − λi| (2.4.1) Rλ,k := max . k |λ | i=1,2,...,d 3 e i
k Thus, we evaluate the maximum relative difference over the d 3 e eigenvalues of largest modulus; dηe denotes the integer closest to η ∈ R. The graphs for Rλ,k, for k = 1,2,...,n, are displayed in the left column of
Figure 2 for each of the 5 symmetric test matrices.
˘ T We turn to a comparison of subspaces. For each k, let T˘k = W˘ kΛkW˘ k be the spectral factorization of T˘k, where
˘ ˘ (k) ˘ (k) ˘ (k) (k) (k) (k) Λk = diag[λ1 ,λ2 ,...,λk ], W˘ k = [w˘1 ,w˘2 ,...,w˘k ],
(k) (k) (k) and introduce the matrix Vk,i = [v1 ,v2 ,...,vi ] consisting of the first i columns of V˘kW˘ k. The columns ˘ (k) ˘ (k) ˘ (k) of Vk,i are the Ritz vectors of A associated with the i Ritz values of largest magnitude, λ1 ,λ2 ,...,λi .
(1) (2) Partition the matrix containing the eigenvectors of A, cf. (1.2.5), according to W = [Wi Wn−i], where
(1) n×i (2) n×(n−i) (1) Wi ∈ R contains the first i eigenvectors, and Wn−i ∈ R the remaining ones. The columns of Wi
(2) and Wn−i span orthogonal subspaces.
32 Figure 2: The graphs in the left column display the relative error Rλ,k between the eigenvalues of the symmetric test problems, and the corresponding Ritz values generated by the Lanczos process. The right column shows the behavior of Rσˆ ,k for the unsymmetric problems; see (2.4.1) and (2.4.3).
33 We compute, for k = 1,2,...,`, the quantities
T (2) (2.4.2) Rw,k := max kV W k. k k,i n−i i=1,2,...,d 3 e
T (2) (1) The norm kVk,iWn−ik measures the distance between the subspaces R(Vk,i) and R(Wi ); see, e.g., [35]. k d 3 e (k) k Thus, Rw,k is small when span{w j} j=1 is approximately contained in span{v j } j=1, that is, when the so-
k lution subspace generated by the Lanczos vectors essentially contains the space generated by the first d 3 e eigenvectors. The graphs in the left column of Figure 3 shows Rw,k, for k = 1,2,...,`, for the symmetric test
T (2) matrices. The distances between subspaces kVk,iWn−ik are displayed in Figure 4, for k = 10 and i = 1,...,k, for two symmetric test matrices.
A few comments on the left-hand side graphs of Figures 2 and 3 are in order. The left graphs of Figure
k 3 show that the span of the first d 3 e eigenvectors of A are numerically contained in the span of the first k
Lanczos vectors already for quite small values of k. We remark that this is not true if we compare the spaces spanned by the first k eigenvectors of A and by its first k Lanczos vectors. Graphs that compare the span of
k the first d 2 e eigenvectors of A with the span of the first k Lanczos vectors look similar to the graphs shown, but display slower convergence; see graphs in the left column of Figure 5. Thus, k has to be larger in order
k for the first d 2 e eigenvectors of A to be numerically in the span of the first k Lanczos vectors. Figure 2 shows
k excellent agreement between the first d 3 e Ritz values of A and the corresponding eigenvalues already for
k small k. The convergence of the first d 2 e Ritz values to the corresponding eigenvalues is somewhat slower than the convergence displayed.
We use the Lanczos decomposition (2.2.11) in our illustrations because this decomposition gives approx- imate solutions of (1.1.1) of higher quality than the decompositions (1.3.6). Analogues of Figures 2 and
3 based on the Lanczos decomposition (1.3.6) look essentially the same as the figures shown. Finally, we remark that the Lanczos decomposition (2.2.11) is computed with reorthogonalization. Without reorthogo- nalization the convergence illustrated by Figures 2 and 3 does not hold.
34 Figure 3: Distance between the subspace spanned by the first dk/3e eigenvectors (resp. singular vectors) of the symmetric (resp. nonsymmetric) test problems, and the subspace spanned by the corresponding Lanczos (resp. Golub–Kahan) vectors; see (2.4.2) and (2.4.4).
35 T (2) Figure 4: Distance kVk,iVn−ik, i = 1,2,...,k, between the subspace spanned by the first i eigenvectors of the Foxgood (left) and Shaw (right) matrices, and the subspace spanned by the corresponding i Ritz vectors at iteration k = 10.
We turn to nonsymmetric matrices A. The Lanczos method is replaced by the Golub–Kahan method (1.3.8) and the spectral factorization by the singular value decomposition (1.2.1). The index ` denotes either the
−12 order n of the matrix or the step at which an element of the bidiagonal projected matrix C¯` is less than 10 , i.e., the step at which a breakdown happens. The graphs in the right-hand side column of Figure 2 show the relative differences
(k) |σ˘i − σˆi| (2.4.3) Rσˆ ,k := max k |σ | i=1,2,...,d 3 e ˆi
(k) k ¯ between the singular values {σ˘i }i=1 of Ck and those of A. The graphs are similar to those in the left-hand side column, except for the example Tomo, which displays slow convergence: this behavior is probably linked to the fact that the Tomo test problem is much less ill-conditioned than the other test problems.
Let Uˆ and Vˆ be the orthogonal matrices in the singular value decomposition (1.2.1) of A, and partition
(1) (2) (1) (2) these matrices similarly as we did for symmetric matrices A, i.e., Uˆ = [Uˆi Uˆn−i] and Vˆ = [Vˆi Vˆn−i],
(1) (1) (2) (2) where the submatrices Uˆi ,Vˆi contain the first i singular vectors and Uˆn−i, Vˆn−i the remaining n − i ones. To investigate the convergence of subspaces, we substitute the singular value decomposition C¯k =
36 Figure 5: Distance between the subspace spanned by the first dk/2e eigenvectors (resp. singular vectors) of selected symmetric (resp. nonsymmetric) test problems and the subspace spanned by the corresponding Lanczos (resp. Golub–Kahan) vectors. The index ` ranges from 1 to either the dimension of the matrix (n = 200) or to the iteration where there is a breakdown in the factorization process.
˘ T U˘k+1Σk+1,kV˘k into (1.3.8) and consider
n T (2) T (2) o (2.4.4) R(u,v),k := max kV Vˆ k,kU Uˆ k , ˆ ˆ k k,i n−i k,i n−i i=1,2,...,d 3 e
where Vk,i and Uk,i are made up of the first i columns of QkV˘k and Pk+1U˘k+1, respectively. Then R(uˆ,vˆ),k measures the distance between subspaces determined by the singular vectors of A and those defined by vectors computed with the Golub–Kahan method.
The quantities R(uˆ,vˆ),k are displayed, for k = 1,2,...,`, in the right column of Figure 3. Figure 5 depicts graphs for the quantities Rvˆ,k and R(uˆ,vˆ),k with the maximum computed over the first dk/2e vectors for four
T (2) T (2) test problems. Figure 6 shows the value taken by max{kVk,iVˆn−ik,kUk,iUˆn−ik}, i = 1,2,...,100, which express the distance between spaces spanned by singular vectors and vectors determined by the Golub–
Kahan method.
We now compare the performances of different regularization methods. The test problems from [41] define both a matrix A and a solution xtrue; the solution of the Lotkin example is the same as for the Shaw example.
37 T (2) T (2) Figure 6: Distance max{kVk,iVn−ik,kUk,iUn−ik}, i = 1,2,...,k, between the subspace spanned by the first i singular vectors of the Heat (left) and Tomo (right) matrices and the subspace spanned by the corresponding i Golub–Kahan vectors at iteration k = 100.
Table 1: Solution of symmetric linear systems: the errors ELanczos and ETEIG are optimal for truncated Lanc- zos iteration and truncated eigenvalue decomposition. The corresponding truncation parameters are denoted by kLanczos and kTEIG. Three noise levels δ are considered; ` denotes the number of Lanczos iterations performed.
noise matrix ` ELanczos kLanczos ETEIG kTEIG δ = 10−6 Deriv2 200 2.0×10−2 49 2.1×10−2 199 Foxgood 24 6.8×10−4 6 6.3×10−4 6 Gravity 46 1.2×10−3 15 1.2×10−3 16 Phillips 200 5.8×10−4 22 5.8×10−4 32 Shaw 19 1.9×10−2 10 1.9×10−2 10
δ = 10−4 Deriv2 200 1.2×10−1 10 1.1×10−1 51 Foxgood 24 1.4×10−2 3 4.5×10−3 4 Gravity 45 1.5×10−2 7 6.3×10−3 12 Phillips 200 4.8×10−3 12 3.9×10−3 15 Shaw 19 4.7×10−2 7 3.4×10−2 9
δ = 10−2 Deriv2 200 3.1×10−1 3 2.3×10−1 12 Foxgood 24 7.7×10−2 2 2.9×10−2 2 Gravity 45 8.0×10−2 3 3.3×10−2 7 Phillips 200 4.3×10−2 6 2.2×10−2 8 Shaw 19 1.2×10−1 7 9.1×10−2 7
38 The error-free data vector is defined by btrue := Axtrue and the contaminated data vector is given by (1.1.2) with δ e := ekb k√ , b true n
n where the random vector eb∈ R models Gaussian noise with mean zero and variance one and δ is a chosen noise level. In our experiments we let δ = 10−6,10−4,10−2.
We measure the accuracy attainable by each regularization method by the relative error
kxkmethod − xtruek kxk − xtruek (2.4.5) Emethod = = min , kxtruek k=1,2,...,` kxtruek
which is obtained by choosing the value k = kmethod that minimizes the error for the method under consider- ation.
Table 1 reports the result obtained by comparing truncated Lanczos iteration (2.1.1) to truncated eigenvalue solution for symmetric test problems. The minimal errors (2.4.5) obtained by applying the Lanczos method and the truncated eigenvalue decomposition method, denoted by ELanczos and ETEIG, are reported in the fourth and sixth columns, respectively. The truncation parameters which produce the minimal errors are listed in the fifth and seventh columns. The third column shows how many Lanczos iterations were executed; an entry smaller than 200 indicates that breakdown occurred. Both errors and truncation parameters values are averages over 20 realization of the random noise. Three noise levels δ were considered. The results in Table 1 suggest that, for the test problems considered, the truncated Lanczos projection method is able to produce solutions essentially equivalent to those obtained by truncated eigenvalue decomposition, with a cheaper algorithm, as the number of iterations required is sometimes far less than the number of eigenvalues required.
Table 2 reports results obtained for nonsymmetric linear discrete ill-posed problems (1.1.1). Here the LSQR method is compared to truncated singular value decomposition (TSVD). The table confirms the conclusions deduced from Table 1.
39 Table 2: Solution of nonsymmetric linear systems: the errors ELSQR and ETSVD are optimal for LSQR and TSVD. The corresponding truncation parameters are denoted by kLSQR and kTSVD. Three noise levels are considered; ` denotes the number of Golub–Kahan iterations performed.
noise matrix ` ELSQR kLSQR ETSVD kTSVD δ = 10−6 Baart 10 5.1×10−2 6 5.1×10−2 6 Heat 196 5.3×10−3 54 5.4×10−3 74 Lotkin 18 3.1×10−1 10 3.1×10−1 10 Tomo 195 7.6×10−3 195 7.6×10−3 195 Wing 7 3.3×10−1 5 3.3×10−1 5
δ = 10−4 Baart 10 7.7×10−2 5 7.7×10−2 5 Heat 196 1.5×10−2 26 1.5×10−2 37 Lotkin 18 4.3×10−1 7 4.3×10−1 7 Tomo 195 2.1×10−2 195 2.3×10−2 195 Wing 7 4.5×10−1 4 4.5×10−1 4
δ = 10−2 Baart 10 1.5×10−1 3 1.5×10−1 3 Heat 196 9.4×10−2 13 9.8×10−2 21 Lotkin 18 4.5×10−1 3 4.5×10−1 3 Tomo 195 1.9×10−1 48 2.0×10−1 180 Wing 7 6.0×10−1 2 6.0×10−1 2
40 Figure 7: The first four LSQR solutions to the Baart test problem (thin lines) are compared to the corre- sponding TSVD solutions (dashed lines) and to the exact solution (thick line). The size of the problem is n = 200, the noise level is δ = 10−4. The thin and dashed lines are very close.
Figure 7 displays the first four regularized solutions produced by the LSQR and TSVD methods when applied to solve the Baart test problem with a noise-contaminated vector b. The noise level is δ = 10−4.
The approximate solutions determined by the LSQR and TSVD methods can be seen to approach each other when the number of iterations k or the truncation parameter k is increased from one to four.
The Tomo test problems arises from the discretization of a 2D tomography problem. Its numerical solution displays some interesting features. It is clear from Table 2 that, when the noise level is large, LSQR produces an approximate solution after kLSQR steps that is of essentially the same quality as approximate solutions determined by TSVD with a truncation parameter kTSVD that is much larger than kLSQR. To better understand this behavior, we consider an image of size 15 × 15 pixels. This gives rise to a minimization problem
225×225 (1.1.1) with a matrix A ∈ R . Figure 8 shows the relative errors ELSQR and ETSVD as functions of the parameter k. LSQR is seen to give much faster convergence to xtrue. The best attainable approximate solutions by LSQR and TSVD are displayed in Figure 9. The LSQR method yields the best approximation of xtrue at step kLSQR = 66. The upper right plot displays this computed solution; the image xtrue is shown in
41 Figure 8: Convergence history for the LSQR and TSVD solutions to the Tomo example of size n = 225, −2 with noise level δ = 10 . The error ELSQR has a minimum at k = 66, while ETSVD is minimal for k = 215. the upper left plot. For comparison, the lower left plot of Figure 9 shows the TSVD solution for k = 66. This restoration is seen to be of poor quality. The best approximate solution determined by TSVD has truncation index kTSVD = 216; it can be seen to be of about the same quality as the best LSQR solution. The results for nonsymmetric problems agree with the ones presented by Hanke [38].
2.5 Conclusion
This chapter shows that the largest eigenvalues (in magnitude) of symmetric matrices and the largest singular values of nonsymmetric matrices that are defined by linear discrete ill-posed problems are well approximated by the corresponding eigenvalues and singular values of projected problems determined by a few steps of the
Lanczos or Golub–Kahan bidiagonalization methods, respectively. Similarly for the corresponding eigen- vectors and singular vectors. This suggests that it often suffices to use a partial Lanczos decomposition or a partial Golub–Kahan bidiagonalization, which are cheaper to compute than partial eigenvalue or singular value decompositions, to determine a solution. Computed examples provide illustrations.
42 Figure 9: Solution by LSQR and TSVD to the Tomo example of size n = 225, with noise level δ = 10−2: exact solution (top left), optimal LSQR solution (top right), TSVD solution corresponding to the same truncation parameter (bottom left), optimal TSVD solution (bottom right).
43 CHAPTER 3
Computation of a truncated SVD of a large linear discrete ill-posed problem
3.1 Introduction
The need to compute the largest or a few of the largest singular values and, generally, also the associated right and left singular vectors of a large matrix of a linear discrete ill-posed problems arises in a variety of applications including the approximate minimization of the generalized cross validation function for determining the amount of regularization [26], the solution of large-scale discrete ill-posed problems with two constraints on the computed solution [50], and the solution of large-scale discrete ill-posed problems with a nonnegativity constraint on the solution [10]. This chapter will focus on the solution of minimization problems (1.1.1) with the aid of the truncated SVD (TSVD). We will refer to the triplets made up of the largest singular values and associated right and left singular vectors of a matrix A as the largest singular triplets of A.
We will illustrate that the largest singular triplets of a large matrix A, that stems from the discretization of a linear ill-posed problem, typically, can be computed inexpensively by implicitly restarted Golub–Kahan bidiagonalization methods such as those described in [6–9, 44]. This is true, in particular, in the common situation when the largest singular values are fairly well separated. Computed examples show the number of matrix-vector product evaluations required with the matrices A and AT by these methods to be only a small multiple (larger than one) of the number needed to compute a partial Golub–Kahan bidiagonalization. This behavior is suggested by results shown in Chapter 2. Typically, only a few of the largest singular triplets of A are required to determine a useful approximation of xtrue. The computation of these triplets is much cheaper than the computation of the (full) SVD of the matrix. We remark that in the applications mentioned above it is convenient or necessary to use the largest singular triplets rather than a partial Golub–Kahan bidiagonalization of the matrix.
44 Many methods have been proposed for the solution of large-scale linear discrete ill-posed problems (1.1.1).
For instance, several iterative methods are available and they do not require the computation of the largest singular triplets of the matrix A; see, e.g., [16, 21, 23, 31, 40, 52, 53, 55] and references therein. However, knowledge of a few of the largest singular values and associated singular vectors often provides valuable insight into the properties of the problem being solved. We will show that the computation of a few of the largest singular triplets generally is quite inexpensive.
3.2 Symmetric linear discrete ill-posed problems
Let the matrix A ∈ Rn×n in (1.1.1) be symmetric. We are interested in smooth approximate solutions of
(1.1.1). These typically can be represented as a linear combination of some of the first eigenvectors w1,w2, w3,... of A. The last eigenvectors generally represent discretizations of highly oscillatory functions. They model “noise” and should not be part of the computed approximate solution. A few of the eigenpairs associ- ated with eigenvalues of largest magnitude of many symmetric matrices that stem from the discretization of linear discrete ill-posed problems can be computed efficiently by implicitly restarted Lanczos methods such as those described in [4, 5, 13].
Recall the truncated eigenvalue decomposition (TEVD) from (1.2.7), with k replaced by s,
T As = WsΛsWs ,
n×s where Ws = [w1,w2,...,ws] ∈ R and
s×s Λs = diag[λ1,λ2,...,λs] ∈ R
for some 1 ≤ s n. We now turn to the computation of the matrices Ws and Λs. The most popular approaches to compute a few extreme eigenvalues and associated eigenvectors of a large symmetric matrix are based on the symmetric Lanczos process, which is displayed by Algorithm 2. Assume for the moment
45 that the input parameter ` in Algorithm 2 is small enough so that the algorithm does not break down, i.e.,
β j+1 > 0 for 1 ≤ j ≤ `. The scalars α j and β j determined by Algorithm 2 then define the symmetric tridiagonal matrix (1.3.5). The vectors v j generated by the algorithm are orthonormal and define the matrix
n×` V` = [v1,v2,...,v`] ∈ R . A matrix interpretation of the recursion relations of Algorithm 2 gives the
(partial) Lanczos decomposition (1.3.6). We will use Theorem 2.2.1, but note that the choice of initial unit vector v1 does not have to be the same as in Algorithm 2.
Since the matrix A defines a linear discrete ill-posed problem, its eigenvalues λ j “cluster” at the origin for large j. Therefore, by (2.2.1) or (2.2.8), the off-diagonal entries β j of the matrix (1.3.5) also “cluster” at zero for j large. We used this property in Chapter 2 to show that, for sufficiently large j, the vectors v j generated by Algorithm 2 are accurate approximations of eigenvectors associated with eigenvalues close to the origin.
Computed examples in Chapter 2 illustrate that for several common linear discrete ill-posed test problems, the space span{v1,v2,...,v`} essentially contains the space span{w1,w2,...,wd`/3e} already for quite small values of `. As before, dηe denotes the smallest integer bounded below by η ≥ 0.
We would like to determine the first few eigenvalues, ordered according to (1.2.6), and associated eigen- vectors of a large symmetric matrix of a linear discrete ill-posed problem. The fact that the vectors v j determined by Algorithm 2 are accurate approximations of eigenvectors for j large enough suggests that only a few iterations with an implicitly restarted symmetric Lanczos method, such as the methods described in [4, 5, 17, 68], are required. These methods compute a sequence of Lanczos decompositions of the form
(1.3.6) with different initial vectors. Let v1 be the initial vector in the presently available Lanczos decompo- sition (1.3.6). An initial vector for the next Lanczos decomposition is determined by applying a polynomial
filter q(A) to v1 to obtain the initial vector q(A)v1/kq(A)v1k of the next Lanczos decomposition. The computation of q(A)v1 is carried out without evaluating additional matrix-vector products with A. The im- plementations [4, 5, 17, 68] use different polynomials q. It is the purpose of the polynomial filter q(A) to damp components of unwanted eigenvectors in the vector v1. We are interested in damping components of eigenvectors associated with eigenvalues of small magnitude. The implicitly restarted symmetric Lanczos
46 method was first described in [17, 68]. We will use the implementation [4, 5] of the implicitly restarted symmetric block Lanczos method with block size one in computations reported in Section 3.4.
We remark that there are several reasons for solving linear discrete ill-posed problems (1.1.1) with a sym- metric matrix A by the TEVD method. One of them is that the singular values of A, i.e., the magnitude of the eigenvalues of A, provide important information about the matrix A and thereby about properties of the linear discrete ill-posed problem (1.1.1). For instance, the decay rate of the singular values with increasing index is an important property of a linear discrete ill-posed problem. Moreover, the truncated eigendecom- position (1.2.7) may furnish an economical storage format for the important part of the matrix A. Large matrices A that stem from the discretization of a Fredholm integral equation of the first kind are generally dense; however, it often suffices to store only a few of its largest eigenpairs. Storage of these eigenpairs typically requires much less computer memory than storage of the matrix A.
3.3 Nonsymmetric linear discrete ill-posed problems
This section is concerned with the solution of linear discrete ill-posed problems (1.1.1) with a large non- symmetric matrix A ∈ Rm×n. Such a matrix can be reduced to a small matrix by application of a few steps of Golub–Kahan bidiagonalization. This is described by Algorithm 3 in Section 1.3.3.
The connection between the Golub–Kahan bidiagonalization and the Lanczos decomposition (1.3.6) is ap- plied in Section 2.3 to show Corollary 2.3.1. The initial unit vector p1 in the Golub–Kahan bidiagonalization
(1.3.8) is not required to be b/kbk. It follows from Corollary 2.3.1 and the fact that the singular values of A cluster at the origin, that the columns q j of the matrix Q` in (1.3.8) for large j are accurate approximations of eigenvectors of AT A, as discussed in Section 2.3. This suggests that the implicitly restarted Golub–Kahan bidiagonalization methods described in [6–9], for many matrices A that stem from the discretization of a linear ill-posed problem, only require fairly few matrix-vector product evaluations to determine a truncated singular value decomposition (1.2.4) with s fairly small. Illustrations are presented in the following section.
We remark that the solution of linear discrete ill-posed problems (1.1.1) with a nonsymmetrix matrix by the
47 TSVD method is of interest for the same reasons as the TEVD is attractive to use for the solution of linear discrete ill-posed problems with a symmetric matrix A; see the end of Section 3.2 for a discussion.
3.4 Computed examples
The main purpose of the computed examples is to illustrate that a few of the largest singular triplets of a large matrix A (or a few of the largest eigenpairs when A is symmetric) can be computed quite inexpensively when A defines a linear discrete ill-posed problem (1.1.1). All computations are carried out using MATLAB
R2012a with about 15 significant decimal digits. A Sony computer running Windows 10 with 4 GB of
RAM was used. MATLAB codes for determining the discrete ill-posed problems in the computed examples stem from Regularization Tools by Hansen [41]. When not explicitly stated otherwise, the matrices A are obtained by discretizing a Fredholm integral equation of the first kind, and are square and of order n = 500.
For some examples finer discretizations, resulting in larger matrices, are used.
The first few examples illustrate that the number of matrix-vector product evaluations required to compute the k eigenpairs of largest magnitude or the k largest singular triplets of a large matrix obtained by the discretization of an ill-posed problem is a fairly small multiple of k. The computations for a symmetric matrix A can be organized in two ways: We use the code irbleigs described in [4, 5] or the code irbla presented in [7]. The former code has an input parameter that specifies whether the k largest or the k smallest eigenvalues are to be computed. Symmetric semidefinite matrices A may only require one call of irbleigs to determine the desired eigenpairs, while symmetric indefinite matrices require at least one call for computing a few of the largest eigenvalues and associated eigenvectors and at least one call for computing a few of the smallest eigenvalues and associated eigenvectors. We have found that the irbla code, which determines a few of the largest singular values and associated singular vectors, can be competitive with irbleigs for symmetric indefinite matrices because it is possible to compute all the required eigenpairs (i.e., singular triplets) with only one call of irbla.
We use the MATLAB code irbleigs [5] with block size one to compute the k eigenvalues of largest magnitude
48 of a large symmetric matrix A. The code carries out d2.5ke Lanczos steps between restarts; i.e., a sequence of Lanczos decompositions (1.3.6) with ` = d2.5ke are computed with different initial vectors v1 until the k desired eigenvalues and associated eigenvectors have been determined with specified accuracy. The default criterion for accepting computed approximate eigenpairs is used, i.e., a computed approximate eigenpair
{eλ j,wej}, with kwejk = 1, is accepted as an eigenpair of A if
(3.4.1) kAwej − wejeλ jk ≤ εη(A), j = 1,2,...,k, where η(A) is an easily computable approximation of kAk and ε = 10−6. The irbleigs code uses a computed approximation of the largest singular value of A as η(A).
Table 1: foxgood test problem.
Number of desired Size of the largest Number of eigenpairs k tridiagonal matrix matrix-vector products 5 d2.5ke 24 10 d2.5ke 32 15 d2.5ke 32 20 d2.5ke 40 25 d2.5ke 50
Example 3.4.1. We illustrate the performance of the irbleigs method [5] when applied to the foxgood test problem (1.4.3), a discretization of the Fredholm integral equation of the first kind. Table 1 displays the average number of matrix-vector product evaluations required by irbleigs over 1000 runs rounded to the closest integer when applied as described above to compute the k eigenpairs of largest magnitude for k = 5,
10,...,25. The number of matrix-vector product evaluations is seen to grow about linearly with k for the larger k-values. Since irbleigs chooses the initial vector in the first Lanczos decomposition computed to be a unit random vector, the number of matrix-vector product evaluations may vary somewhat between different calls of irbleigs.
We remark that the choice of ` = d2.5ke steps with the Lanczos method between restarts is somewhat
49 arbitrary and so is the choice of block size one. While the exact number of matrix-vector product evaluations depends on these choices, the linear growth of the number of matrix-vector products computed with the number of desired eigenpairs can be observed for many choices of ` and block sizes. 2
In the following examples we use the MATLAB code irbla [7], which implements a restarted Golub–Kahan block bidiagonalization method. We set the block size to one. In order to determine the k largest singular triplets, irbla determines a sequence of Golub–Kahan decompositions (1.3.8) with ` chosen to be d1.5ke or smaller for different initial vectors p1 until the k largest singular triplets have been computed with desired accuracy. The default stopping criterion is used, which is analogous to (3.4.1). The initial vector p1 in the first Golub–Kahan bidiagonalization (1.3.8) determined by irbla is a unit random vector. The number of matrix-vector product evaluations therefore may vary somewhat between different calls of irbla. The number of matrix-vector product evaluations with A (and with AT when A is nonsymmetric) reported in the tables are averages over 1000 runs rounded to the closest integer.
When solving linear discrete ill-posed problems (1.1.1) by truncated iteration using Golub–Kahan bidiago- nalization, one generally uses the initial vector p1 = b/kbk; see, e.g., [31, 40]. This suggests that it could be beneficial to use this vector as initial vector in the first Golub–Kahan bidiagonalization computed by irbla. It turns out that choosing p1 in this manner, instead of as a random unit vector, changes the number of matrix-vector product evaluations required only very little. We will illustrate this below.
Table 2: shaw test problem.
Number of desired Size of the largest Number of eigenpairs k bidiagonal matrix matrix-vector products 5 d1.5ke 19 10 d1.5ke 30 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76
Example 3.4.2. We now consider the shaw test problem (1.4.7). Table 2 shows the number of matrix-vector product evaluations required for computing the k eigenpairs of largest magnitude using the code irbla for
50 k = 5,10,...,25 when the number of steps between restarts is d1.5ke. The number of matrix-vector products is seen to grow about linearly with k for k ≥ 10. Thus, the computational effort required is quite small.
Table 2 displays averages over 1000 runs with random initial unit vectors p1 for the first Golub–Kahan bidiagonalization computed. When instead using p1 = b/kbk, the number of matrix-vector products is unchanged for k = 10,15,20,25, and is reduced to 16 for k = 5. Thus, the effect of changing the initial vector in irbla is small. Table 3: shaw test problem.
Number of desired Size of the largest Number of eigenpairs k bidiagonal matrix matrix-vector products 5 k + 3 19 10 k + 2 24 15 k + 2 34 20 k + 2 44 25 k + 1 52
The number of required matrix-vector product evaluations depends on the number of steps, `, with the
Golub–Kahan bidiagonalization method between restarts. We found that choosing ` very small may increase the required number of matrix-vector product evaluations with A and AT . Moreover, choosing a large `-value does not always result in a reduced number of matrix-vector product evaluations. This is illustrated by Table
3, in which the number of bidiagonalization steps between restarts is smaller than for Table 2, and so are the number of matrix-vector product evaluations required to determine the desired singular triplets.
The number ` of bidiagonalization steps between restarts that requires the smallest number of matrix-vector product evaluations is difficult to determine a priori. The important observation is that for many choices of
` the number of matrix-vector product evaluations with A and AT is quite small. This makes it possible to compute a few of the largest singular triplets of a large matrix fairly inexpensively.
Table 3 displays averages over 1000 runs with random initial unit vectors p1 for the first Golub–Kahan bidi- agonalization computed. When instead using p1 = b/kbk, the number of matrix-vector product evaluations
51 is unchanged for k = 10,15,20,25, and is increased to 22 for k = 5. 2
Table 4: phillips test problem.
Number of desired Size of the largest Number of eigenpairs k bidiagonal matrix matrix-vector products 5 d1.5ke 22 10 d1.5ke 36 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76
Example 3.4.3. We now consider the phillips test problem (1.4.6). Table 4 displays the number of matrix- vector product evaluations required to compute the k eigenpairs of largest magnitude (i.e., the k largest singular triplets) for k = 5,10,...,25 by irbla. A random unit vector p1 is used as initial vector for the first
Golub–Kahan bidiagonalization computed by irbla and the number of matrix-vector product evaluations are averages over 1000 runs. The number of matrix-vector product evaluations is seen to grow about linearly with k for k ≥ 10. If instead p1 = b/kbk is used as initial vector for the first Golub–Kahan bidiagonalization computed by irbla, the number of matrix-vector product evaluations is the same for k = 5,10,...,25. Thus, the choice of initial vector is not important.
The number of required matrix-vector product evaluations is quite insensitive to how finely the integral equations (1.4.6) is discretized for all fine enough discretizations. For instance, when the integral equation is discretized by a Galerkin method using the MATLAB function phillips from [41] to obtain a matrix
A ∈ R5000×5000 and the initial vector for the first Golub–Kahan bidiagonalization computed by irbla is chosen
500×500 to be p1 = b/kbk, the number of required matrix-vector product evaluations is the same as for A ∈ R .
We conclude that the computational expense to compute a truncated singular value decomposition of A is modest also for large matrices. 2
The following examples are concerned with nonsymmetric matrices. The k largest singular triplets are computed with the irbla code [7] using block size one.
Example 3.4.4. This example uses the baart test problem (1.4.1). Table 5 shows the number of required
52 Table 5: baart test problem.
Number of desired Size of the largest Number of singular triplets k bidiagonal matrix matrix-vector products 5 d1.5ke 16 10 d1.5ke 30 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76 matrix-vector product evaluations to grow roughly linearly with the number of desired largest singular triplets. Both the initial vector p1 = b/kbk and the average over 1000 runs with a random unit initial vector p1 for the initial Golub–Kahan bidiagonalization computed by irbla yield the same entries of the last column. Table 6: baart test problem.
Number of desired Size of the largest Number of singular triples k bidiagonal matrix matrix-vector products 5 k + 2 14 10 k + 1 22 15 k + 1 32 20 k + 1 42 25 k + 1 52
Table 6 is analogous to Table 5. Only the number of steps ` between restarts of Golub–Kahan bidiagonal- ization differs. They are smaller in Table 6 than in Table 5 and so are the required number of matrix-vector product evaluations with A and AT . Also in Table 6 the number of matrix-vector products needed is seen to grow about linearly with k. The initial vector p1 = b/kbk and the average over 1000 runs with a random unit initial vector p1 for the first Golub–Kahan bidiagonalization computed by irbla yield the same entries in the last column of Table 6. 2
Example 3.4.5. This example uses the i_laplace test problem (1.4.5). Table 7 displays the number of matrix- vector product evaluations required to compute the k largest singular triples. This number is seen to grow
53 Table 7: Inverse Laplace transform test problem.
Number of desired Size of the largest Number of singular triplets k bidiagonal matrix matrix-vector products 5 d1.5ke 22 10 d1.5ke 30 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76
about linearly with k. Therefore, the computational effort is quite small. The initial vector p1 = b/kbk and the average over 1000 runs with a random unit initial vector p1 for the initial Golub–Kahan bidiagonalization computed by irbla yield the same entries in the last column of Table 7. 2
Table 8: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is a unit random vector.
Problem s MVP Epsvd Etsvd Es shaw 7 22 8.23×10−15 5.06×10−2 5.06×10−2 phillips 7 22 7.77×10−9 2.53×10−2 2.53×10−2 baart 3 10 7.94×10−8 1.67×10−1 1.67×10−1 i_laplace 7 22 3.67×10−7 2.24×10−1 2.24×10−1
Table 9: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk.
Problem s MVP Epsvd Etsvd Es shaw 7 22 8.29×10−15 4.97×10−2 4.97×10−2 phillips 7 22 9.25×10−10 2.50×10−2 2.50×10−2 baart 3 10 1.15×10−9 1.68×10−1 1.68×10−1 i_laplace 7 22 1.00×10−8 2.24×10−1 2.24×10−1
Example 3.4.6. This example compares the quality of approximate solutions of linear discrete ill-posed problems computed by truncated singular value decomposition. The decompositions are determined as described in this chapter as well as by computing the full SVD (1.2.1) with the MATLAB function svd. It is the aim of this example to show that the decompositions computed in these ways yield approximations of xtrue of the same quality. We use the discrepancy principle (1.2.15) to determine the truncation index s.
54 Table 10: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−4. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk.
Problem s MVP Epsvd Etsvd Es shaw 9 28 5.01×10−14 3.21×10−2 3.21×10−2 phillips 12 42 3.33×10−8 4.23×10−3 4.23×10−3 baart 4 12 1.71×10−10 1.15×10−1 1.15×10−1 i_laplace 12 40 5.46×10−13 1.71×10−1 1.71×10−1
Table 11: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−6. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk.
Problem s MVP Epsvd Etsvd Es shaw 10 30 1.22×10−12 1.94×10−2 1.94×10−2 phillips 27 82 2.47×10−13 7.99×10−4 7.99×10−4 baart 5 16 1.67×10−12 5.26×10−2 5.26×10−2 i_laplace 18 54 4.32×10−12 1.43×10−1 1.43×10−1
The number of matrix-vector product evaluations used to compute approximations of xtrue depends on the number of singular triplets required to satisfy the discrepancy principle. The latter number typically in- creases as the error in the vector b in (1.1.1) decreases because then the least-squares problem has to be solved more accurately. In realistic applications, one would first compute the ` largest singular triplets, for some user-chosen value of `, and if it turns out that additional singular triplets are required to satisfy the discrepancy principle, then one would determine the next, say, ` largest singular triplets, etc. This approach may require the evaluation of somewhat more matrix-vector products than if all required largest singular triplets were computed together. Conversely, if only the `/2 largest singular triplets turn out to be needed to satisfy the discrepancy principle, then the number of matrix-vector products can be reduced by initially computing only these triplets instead of the ` largest singular triplets.
To reduce the influence on the number of matrix-vector products required by the somewhat arbitrary choice of the initial number of largest singular triplets to be computed, we proceed as follows. We first compute the SVD (1.2.1) using the MATLAB function svd and then determine the smallest truncation index s so that the discrepancy principle (1.2.15) holds. We denote the approximation of xtrue so determined by xtsvd. Then
55 we use irbla to compute the s largest singular triplets of A. Several tables compare the quality of the so computed approximations of xtrue in different ways. The quantity
kxs − xtsvdk Epsvd = kxtsvdk
shows the relative difference between the approximate solution xtsvd and the approximate solution xs deter- mined by computing a truncated singular value decomposition (1.2.4) of the matrix A using the irbla method with the number of Golub–Kahan bidiagonalization steps between restarts set to d1.5se. We also display the relative difference
kxtsvd − xtruek Etsvd = , kxtruek which shows how well the approximate solution determined by the full singular value decomposition ap- proximates the desired solution xtrue. The analogous relative difference
kxs − xtruek Es = kxtruek shows how well the approximate solution determined by the truncated singular value decomposition approx- imates xtrue.
−t Tables 8-11 report results for three noise levels δ˜ = kek/kbtruek (δ˜ = 10 for t = 2,4,6) and τ = 1 in
(1.2.15). The error vector e models white Gaussian noise. Thus, given the vector btrue, which is generated by the MATLAB function that determines the matrix A, a vector e that models white Gaussian noise is added to btrue to obtain the error-contaminated vector b; cf. (1.1.2). The vector e is scaled to correspond to a prescribed noise level δ˜. We generate this additive noise vector, e, with
δ˜ e := ekb k√ , b true n
500 where eb∈ R is a random vector whose entries are from a normal distribution with mean zero and variance
56 one.
For Tables 9-11, the initial vector for the first Golub–Kahan bidiagonalization computed by irbla is chosen to be p1 = b/kbk. This choice is quite natural, because we would like to solve least-squares problems (1.1.1) with data vector b. The table columns with heading “MVP” display the number of matrix-vector product evaluations required by irbla to compute the truncated singular value decomposition required to satisfy the discrepancy principle. Table 8 shows averages over 1000 runs with random unit initial vectors p1 for the
first Golub–Kahan bidiagonalization computed by irbla. The entries of this table and Table 9 are quite close.
The closeness of corresponding entries also can be observed for smaller noise levels. We therefore omit to display tables analogous to Table 8 for smaller noise levels.
Tables 8-11 show the computed approximate solutions determined by using irbla to give as good approxi- mations of xtrue as the approximate solutions xtsvd computed with the aid of the full SVD (1.2.1), while being much cheaper to evaluate. 2
Example 3.4.7. The LSQR iterative method for the solution of the minimization problem (1.1.1) deter- mines partial bidiagonalizations of A with initial vector b/kbk. These bidiagonalizations are closely related to the bidiagonalizations (1.3.8); see, e.g., [12, 35] for details. In the ith step, LSQR computes an approx-
T T T T T T i−1 T imate solution xi in the Krylov subspace Ki(A A,A b) = span{A b,(A A)A b,...,(A A) A b}. This approximate solution satisfies
kAxi − bk = min kAx − bk. T T x∈Ki(A A,A b)
The number of steps i is determined by the discrepancy principle, i.e., i is chosen to be the smallest integer such that the computed approximate solution xi satisfies
kAxi − bk ≤ τδ;
c.f. (1.2.15). Details on the application of LSQR with the discrepancy principle are discussed in, e.g.,
[23, 31, 40].
57 This example considers the situation when there are several data vectors, i.e., we would like to solve
(3.4.2) min kAx( j) − b( j)k, j = 1,2,...,`. x( j)∈Rn
We let the matrix A ∈ R500×500 be determined by one of the functions in Regularization Tools [41] already
(1) 500 used above. This function also determines the error-free data vector btrue ∈ R . The remaining error-free
( j) 500 ( j) data vectors btrue ∈ R are obtained by choosing discretizations xtrue, j = 2,3,...,`, of functions of the form α sin(βt) + γ, α cos(βt) + γ, α arctan(βt) + γ, and αt2 + βt + γ, where α, β, and γ are randomly
( j) ( j) ( j) 500 generated scalars, and letting btrue = Axtrue. A “noise” vector e ∈ R with normally distributed random
( j) ( j) entries with zero mean is added to each data vector btrue to determine an error-contaminated data vector b ; see (1.1.2). The noise-vectors e( j) are scaled to correspond to a specified noise level. This is simulated with
( j) δ˜ e( j) := e( j) kb k√ , b true n
˜ ( j) n where δ is the noise level, and eb ∈ R is a vector, whose elements are normally distributed random numbers with mean zero and variance one.
Assume that the data vectors are available sequentially. Then the linear discrete ill-posed problems (3.4.2) can be solved one by one by
(A) applying the LSQR iterative method to each one of the ` linear discrete ill-posed problems (3.4.2).
The iterations for each system are terminated by the discrepancy principle. Since the data vectors b( j)
are distinct, each one of these vectors requires that a new partial Golub–Kahan bidiagonalization be
computed, and by
(B) computing one TSVD of the matrix A by the irbla method, and then using this decomposition to
determine an approximate solution of each one of the problems (3.4.2). The discrepancy principle is
applied to compute the approximate solution of each least-squares problem. Thus, we determine the
parameter s in (1.2.4) as large as possible so that the discrepancy principle (1.2.15) holds with τ = 1.
58 Seed methods furnish another solution approach that is commonly applied when seeking to solve several lin- ear systems of equations with the same matrix and different right-hand sides that stem from the discretization of well-posed problems, such as Dirichlet boundary value problems for an elliptic partial differential equa- tions; see, e.g., [1, 63, 67] and references therein. A Golub–Kahan bidiagonalization-based seed method applied to the solution of the ` > 1 least-squares problems (3.4.2) could proceed as follows. First one com- putes a partial Golub–Kahan bidiagonalization for the least-squares problem with one of the data vectors, say b(1), and then uses this bidiagonalization to solve all the remaining least-squares problems (3.4.2). The
(1) (1) Golub–Kahan bidiagonalization is determined by the matrix A and the initial vector p1 = b /kb k. We will not apply this technique in this example but will explore a variation of it in Section 4.3.
Tables 12-14 compare the number of matrix-vector products required by the approaches (A) and (B) for
−2 −4 ` = 10 and noise-contaminated data vectors b j corresponding to the noise levels δ˜ = 10 , δ˜ = 10 , and
δ˜ = 10−6. For each noise level, 100 noise-contaminated data vectors b( j) are generated for each 1 ≤ j ≤ `.
(1) (1) The initial Golub–Kahan bidiagonalization determined by irbla uses the initial vector p1 = b /kb k.
The tables display averages over the realizations of the noise-contaminated data vectors b( j). Similarly as in Example 3.4.5, the number of matrix-vector product evaluations required by the method of this chapter depends on the initial choice of the number of singular triplets to be computed. To simplify the comparison, we assume this number to be known. The competitiveness of the method of this chapter is not significantly affected by this assumption; it is straightforward to compute more singular triplets if the initial number of computed triplets is found to be too small to satisfy the discrepancy principle. To avoid that round-off errors introduced during the computations with LSQR delay convergence, Golub–Kahan bidiagonalization is carried out with reorthogonalization; see [40] for a discussion of the role of round-off errors on the convergence.
The columns of Tables 12-14 are labeled similarly as in the previous examples. The error in the computed solutions is the maximum for the errors for each one of the least-squares problems (3.4.2). The columns labeled MVPlsqr show the average numbers of matrix-vector product evaluations required by LSQR, and
59 the columns denoted by Elsqr shows the relative error in the approximate solution determined by the LSQR algorithm, xlsqr, i.e.,
kxlsqr − xtruek Elsqr = . kxtruek
Tables 12-14 show the method of the present chapter to require significantly fewer matrix-vector product evaluations than repeated application of LSQR. For large-scale problems, the dominating computational effort of these methods is the evaluation of matrix-vector products with A and AT . Similarly as in Example
3.4.2, the number of matrix-vector product evaluations does not change significantly with the fineness of the discretization, i.e., with the size of the matrix A determined by the functions in [41].
Finally, we remark that, while the present example uses the discrepancy principle to determine the parameter s in the TSVD and the number of LSQR iterations, other techniques also can be used for these purposes, such as methods discussed in [25, 26, 40, 45, 61]. The results would be fairly similar. Thus, the relative performance of the methods displayed in Tables 12-14 is not very sensitive to how the parameter s and the number of iterations are determined nor to their exact values. 2
Table 12: Relative errors and number of matrix-vector product evaluations, δ˜ = 10−2.
Problem s MVP MVPlsqr Epsvd Etsvd Es Elsqr shaw 6 20 133 3×10−6 2×10−1 2×10−1 1×10−1 phillips 7 22 152 1×10−6 7×10−2 7×10−2 6×10−2 baart 3 10 71 2×10−8 1×10−1 1×10−1 1×10−1 i_laplace 8 25 175 1×10−5 7×10−1 7×10−1 7×10−1
Table 13: Relative errors and number of matrix-vector product evaluations, δ˜ = 10−4.
Problem s MVP MVPlsqr Epsvd Etsvd Es Elsqr shaw 9 27 178 2×10−13 3×10−2 3×10−2 3×10−2 phillips 14 41 279 3×10−4 1×10−2 1×10−2 8×10−3 baart 5 14 99 3×10−10 7×10−2 7×10−2 7×10−2 i_laplace 14 42 285 3×10−13 7×10−1 7×10−1 7×10−1
60 Table 14: Relative errors and number of matrix-vector product evaluations, δ˜ = 10−6.
Problem s MVP MVPlsqr Epsvd Etsvd Es Elsqr shaw 10 30 203 1×10−12 7×10−3 7×10−3 7×10−3 phillips 30 89 601 2×10−11 3×10−3 3×10−3 1×10−3 baart 5 17 117 6×10−10 4×10−2 4×10−2 4×10−2 i_laplace 19 58 393 2×10−11 7×10−1 7×10−1 7×10−1
3.5 Conclusion
Knowledge of the largest singular values and associated singular vectors of a matrix may provide important information about the linear discrete ill-posed problem at hand. However, the computation of the singular value decomposition of a general large matrix can be prohibitively expensive. This chapter illustrates that the computation of a few of the largest singular triplets of a matrix that stems from the discretization of an ill-posed problem may be quite inexpensive. The largest singular triplets generally are the only singular triplets of interest. Similarly, the computed examples show that it can be quite inexpensive to compute a few eigenpairs of largest magnitude of a symmetric matrix of a linear discrete ill-posed problem. Applications to the solution of several linear discrete ill-posed problems with the same matrix and different data vectors show the computation of the TSVD of the matrix to be competitive with sequential solution of the linear discrete ill-posed problems by the LSQR iterative method.
61 CHAPTER 4
Solution methods for linear discrete ill-posed problems for color image restoration
We will discuss the use of iterative methods based on standard or block Golub–Kahan-type bidiagonlization, combined with Tikhonov regularization, to the restoration of a multi-channel image from an available blur- and noise-contaminated version. Applications include the restoration of color images whose RGB (red, green, and blue) representation uses three channels; see [28, 42]. The methods described can also be applied to the solution of Fredholm integral equations of the first kind in two or more space dimensions and to the restoration of hyper-spectral images. The latter kind of images generalize color images in that they allow more than three “colors”; see, e.g. [48]. For definiteness, we will focus on the restoration of k-channel images that have been contaminated by blur and noise, and we formulate this restoration task as a linear system of equations with k right-hand side vectors, where each spectral band corresponds to one channel.
To simplify, our notation we assume the image to be represented by an array of n × n pixels in each one of the k channels, where 1 ≤ k n2. Let b(i) ∈ Rn2 represent the available blur- and noise-contaminated image
(i) n2 (i) n2 in channel i, let e ∈ R describe the noise in this channel, and let xtrue ∈ R denote the desired unknown
n2k blur- and noise-free image in channel i. The corresponding quantities for all k channels b,xtrue,e ∈ R are
(i) (i) (i) (1) T (k) T T obtained by stacking the vectors b ,xtrue,e of each channel. For instance, b = [(b ) ,...,(b ) ] .
The degradation model is of the form
(4.0.1) b = Hxtrue + e
62 with blurring matrix
a1,1A a1,2A ··· a1,kA a2,1A a2,2A ··· a2,kA n2k×n2k H = Ak ⊗ A = ∈ R . . . . . . . ak,1A ak,2A ··· ak,kA
Here ⊗ denotes the Kronecker product, the matrix A ∈ Rn2×n2 represents within-channel blurring, which
k×k is assumed to be the same in all channels, and the small matrix Ak ∈ R models cross-channel blurring.
Sometimes it is convenient to gather images for the different channels in “block vectors.” Introduce the
(1) (k) n2×k (1) (k) n2×k (1) (k) n2×k block vectors B = [b ,...,b ] ∈ R , Xtrue = [xtrue,...,xtrue] ∈ R , and E = [e ,...,e ] ∈ R .
Using properties of the Kronecker product, the model (4.0.1) can be expressed as
(4.0.2) B = A (Xtrue) + E,
where the linear operator A is defined by
2 2 A : Rn ×k → Rn ×k
T A (X) := AXAk .(4.0.3)
T T Its transpose is given by A (X) := A XAk. The model (4.0.2) is said to have cross-channel blurring when
Ak 6= Ik; when Ak = Ik, there is no cross-channel blurring. In the latter situation, the blurring is said to be within-channel only, and the deblurring problem decouples into k independent deblurring problems. The degradation model (4.0.1) then can be expressed in the form
(4.0.4) B = AXtrue + E,
63 For notational simplicity, we denote in the following both the matrix A in (4.0.4) and the linear operator A in (4.0.2) by A, and we write A (X) as AX. The singular values of a blurring matrix or operator A typically
“cluster” at the origin. It follows that the solution (if it exists) of the linear system of equations
(4.0.5) AX = B
is very sensitive to the error E in B. Let Btrue denote the (unknown) noise-free block vector associated with
B. The system of equations AX = Btrue is assumed to be consistent. We would like to determine an accurate approximation of Xtrue given B and A. This generally is a difficult computational task due to the error E in
B and the presence of tiny singular values of A. We remedy this by using the standard Tikhonov solution discussed in Section 1.2.3. It reduces the sensitivity of the solution of (4.0.5) to the error E in B by replacing
(4.0.5) by a penalized least-squares problem analogous to (1.2.12),
2 −1 2 (4.0.6) min {kAX − BkF + µ kXkF }, X∈Rn2×k
where µ > 0 is the regularization parameter and k · kF denotes the Frobenius norm. The normal equations associated with the minimization problem (4.0.6) are given by
(4.0.7) (AT A + µ−1I)X = AT B.
They have the unique solution