LANCZOS AND GOLUB-KAHAN REDUCTION METHODS

APPLIED TO ILL-POSED PROBLEMS

A dissertation submitted to

Kent State University in partial

fulfillment of the requirements for the

degree of Doctor of Philosophy

by

Enyinda N. Onunwor

May, 2018 Dissertation written by Enyinda N. Onunwor A.A., Cuyahoga Community College, 1998 B.S., Youngstown State University, 2001 B.S., Youngstown State University, 2001 M.S., Youngstown State University, 2003 M.A., Kent State University, 2011 Ph.D., Kent State University, 2018

Approved by , Chair, Doctoral Dissertation Committee Lothar Reichel

, Member, Doctoral Dissertation Committee Jing Li

, Member, Doctoral Dissertation Committee Jun Li

, Member, Outside Discipline Arden Ruttan

, Member, Graduate Faculty Representative Arvind Bansal

Accepted by , Chair, Department of Mathematical Sciences Andrew Tonge

, Dean, College of Arts and Sciences James L. Blank TABLE OF CONTENTS

LIST OF FIGURES ...... v

LIST OF TABLES ...... vii

ACKNOWLEDGEMENTS ...... x

NOTATION ...... xii

1 Introduction ...... 1

1.1 Overview ...... 1

1.2 Regularization methods ...... 2

1.2.1 Truncated singular value decomposition (TSVD) ...... 2

1.2.2 Truncated eigenvalue decomposition (TEVD) ...... 4

1.2.3 Tikhonov regularization ...... 5

1.2.4 Regularization parameter: the discrepancy principle ...... 7

1.3 methods ...... 8

1.3.1 The Arnoldi method ...... 9

1.3.2 The symmetric Lanczos process ...... 11

1.3.3 Golub-Kahan ...... 13

1.3.4 Block Krylov methods ...... 15

1.4 The test problems ...... 16

1.4.1 Descriptions of the test problems ...... 16

2 Reduction methods applied to discrete ill-posed problems ...... 20

2.1 Introduction ...... 20

iii 2.2 Application of the symmetric Lanczos method ...... 21

2.3 Application of the Golub–Kahan reduction method ...... 29

2.4 Computed examples ...... 31

2.5 Conclusion ...... 42

3 Computation of a truncated SVD of a large linear discrete ill-posed problem ...... 44

3.1 Introduction ...... 44

3.2 Symmetric linear discrete ill-posed problems ...... 45

3.3 Nonsymmetric linear discrete ill-posed problems ...... 47

3.4 Computed examples ...... 48

3.5 Conclusion ...... 61

4 Solution methods for linear discrete ill-posed problems for color image restoration ...... 62

4.1 Solution by partial block Golub–Kahan bidiagonalization ...... 66

4.2 The GGKB method and Gauss-type quadrature ...... 72

4.3 Golub–Kahan bidiagonalization for problems with multiple right-hand sides ...... 77

4.4 Computed examples ...... 81

4.5 Conclusion ...... 87

BIBLIOGRAPHY ...... 88

iv LIST OF FIGURES

1 Behavior of the bounds (2.2.1) (left), (2.2.7) (center), and (2.3.1) (right), with respect to the

iteration index `. The first test is symmetric positive definite, the second is symmetric

indefinite, and the third is unsymmetric. The left-hand side of each inequality is represented

by crosses, the right-hand side by circles...... 31

2 The graphs in the left column display the relative error Rλ,k between the eigenvalues of

the symmetric test problems, and the corresponding Ritz values generated by the Lanczos

process. The right column shows the behavior of Rσˆ ,k for the unsymmetric problems; see

(2.4.1) and (2.4.3)...... 33

3 Distance between the subspace spanned by the first dk/3e eigenvectors (resp. singular vec-

tors) of the symmetric (resp. nonsymmetric) test problems, and the subspace spanned by the

corresponding Lanczos (resp. Golub–Kahan) vectors; see (2.4.2) and (2.4.4)...... 35

T (2) 4 Distance kVk,iVn−ik, i = 1,2,...,k, between the subspace spanned by the first i eigenvectors

of the Foxgood (left) and Shaw (right) matrices, and the subspace spanned by the corre-

sponding i Ritz vectors at iteration k = 10...... 36

5 Distance between the subspace spanned by the first dk/2e eigenvectors (resp. singular vec-

tors) of selected symmetric (resp. nonsymmetric) test problems and the subspace spanned

by the corresponding Lanczos (resp. Golub–Kahan) vectors. The index ` ranges from 1 to

either the dimension of the matrix (n = 200) or to the iteration where there is a breakdown

in the factorization process...... 37

T (2) T (2) 6 Distance max{kVk,iVn−ik,kUk,iUn−ik}, i = 1,2,...,k, between the subspace spanned by the

first i singular vectors of the Heat (left) and Tomo (right) matrices and the subspace spanned

by the corresponding i Golub–Kahan vectors at iteration k = 100...... 38

v 7 The first four LSQR solutions to the Baart test problem (thin lines) are compared to the

corresponding TSVD solutions (dashed lines) and to the exact solution (thick line). The size

of the problem is n = 200, the noise level is δ = 10−4. The thin and dashed lines are very

close...... 41

8 Convergence history for the LSQR and TSVD solutions to the Tomo example of size n =

−2 225, with noise level δ = 10 . The error ELSQR has a minimum at k = 66, while ETSVD is

minimal for k = 215...... 42

9 Solution by LSQR and TSVD to the Tomo example of size n = 225, with noise level

δ = 10−2: exact solution (top left), optimal LSQR solution (top right), TSVD solution cor-

responding to the same truncation parameter (bottom left), optimal TSVD solution (bottom

right)...... 43

1 Example 2: Original image (left), blurred and noisy image (right) ...... 85

2 Example 2: Restored image by Algorithm 5 (left), and restored image by Algorithm 6 (right). 85

3 Example 3: Cross-channel blurred and noisy image (left), restored image by Algorithm 6

(right)...... 86

vi LIST OF TABLES

1 Solution of symmetric linear systems: the errors ELanczos and ETEIG are optimal for truncated

Lanczos iteration and truncated eigenvalue decomposition. The corresponding truncation

parameters are denoted by kLanczos and kTEIG. Three noise levels δ are considered; ` denotes

the number of Lanczos iterations performed...... 38

2 Solution of nonsymmetric linear systems: the errors ELSQR and ETSVD are optimal for LSQR

and TSVD. The corresponding truncation parameters are denoted by kLSQR and kTSVD.

Three noise levels are considered; ` denotes the number of Golub–Kahan iterations performed. 40

1 foxgood test problem...... 49

2 shaw test problem...... 50

3 shaw test problem...... 51

4 phillips test problem...... 52

5 baart test problem...... 53

6 baart test problem...... 53

7 Inverse Laplace transform test problem...... 54

8 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial

vector for the first Golub–Kahan bidiagonalization computed by irbla is a unit random vector. 54

9 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial

vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk...... 54

10 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−4. The initial

vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk...... 55

11 Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−6. The initial

vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk...... 55

12 Relative errors and number of matrix-vector product evaluations, δ˜ = 10−2...... 60

vii 13 Relative errors and number of matrix-vector product evaluations, δ˜ = 10−4...... 60

14 Relative errors and number of matrix-vector product evaluations, δ˜ = 10−6...... 61

1 Results for the phillips test problem ...... 82

2 Results for the baart test problem ...... 83

3 Results for the shaw test problem ...... 83

4 Results for Example 2 ...... 84

viii To Olivia and Kristof

ix ACKNOWLEDGEMENTS

This work would not have been possible without the wisdom, support, and tireless assistance of my advisor,

Lothar Reichel. I genuinely appreciate both his patience with me and the guidance he has given me over the years. His invaluable encouragement and counsel have been critical in facilitating the progress I have made to this point. He has truly been a blessing, and he has made a positive impact on my life.

In addition, I extend my undying gratitude to my committee: Jing Li, Jun Li, Arden Ruttan, and Arvind

Bansal. I am tremendously indebted to them for their collective time, effort, and direction.

I would be remiss if I failed to recognize the important contributions made by the following collaborators:

Silvia Gazzola, Giuseppe Rodriguez, Mohamed El Guide, Abdeslem Bentbib, and Khalide Jbilou. A special thanks to Xuebo Yu for helping me debug my codes and for his valuable input.

I honor the memory of my parents, HRH Sir Wobo Weli Onunwor and Dame Nchelem Onunwor. Their legacy of love, strength, determination, support, and faith imbued me with the courage I needed to achieve this objective, and they will forever endure in my spirit and in my work. My sister, Chisa, is one of the most brilliant people I know; her fortitude and determination are unmatched, and I am inspired by her integrity and work ethic. My oldest brother, HRH Nyema Onunwor, sets the example for the rest of us; he helps us maintain a calm demeanor in the face of the challenges we encounter and remains a constant voice of reason.

I offer my deep respect and admiration to my other siblings, Rommy, Acho, and Emenike, for helping me maintain my sanity through this process. Their stimulating conversations and the familial communion we share sustained and comforted me when I was in need of a respite during challenging moments. My thanks to

Dike Echendu for is wisdom and advice. Special thanks to two of my closest friends, Dennis Frank-Ito and

Ian Miller for their mathematical insights and constant encouragement. My cousins Anderson, Blessing,

Charles, Mary-Ann, and Gloria are like siblings to me, and their parents, Dr. Albert and Ezinne Charity

Nnewihe, have acted as my parental figures. I will be eternally grateful to them for their emotional support

x and loving guidance. Thanks to Chiso Obiandu for his pep talks and for pushing the right motivational buttons. Special thanks to Dorothy Acheru Agbude, who ensured that this dissertation came to fruition.

I am glad that the battle is over. I extend my thanks to the remainder of my family and friends for their unconditional love and support.

Finally, I offer my deepest gratitude to my caring, loving, and supportive wife, Kinga. Your consistent encouragement, unending patience, and unflagging faith in me through the rough times have sustained me more than words can express. Thank you so much!

xi NOTATION

Unless stated otherwise, the following notation will be used throughout this dissertation. Standard notation

is used whenever possible.

A an m × n matrix

In n × n

k · k the Euclidean vector norm, or the induced operator norm

n tr(A) the trace of an n × n matrix A is the summation of the diagonal entries, tr(A) = ∑ aii i=1

T 1/2 kAkF the Frobenius norm of A defined by kAkF = tr(A A)

hu,vi the inner product between vectors u and v

xtrue the exact but unknown true solution

btrue the exact data

e error or noise vector, i.e. the perturbation in the data

e(i) an error-vector in a problem with multiple right-hand sides

ei the i−th standard basis vector of appropriate dimension

AT the transpose of A

A∗ the Hermitian conjugate or Hermitian adjoint of A

A† the Moore-Penrose pseudoinverse of A

A ⊗ B the Kronecker product of matrices A and B

Ai, j the leading principal i × j submatrix of A

xii R(·) the range or column space

N (·) the nullspace

κ(·) the

λ regularization parameter

λi an eigenvalue

A = W ΛW ∗ The spectral factorization of the matrix A = AT where

n×n  W = [w1,w2,...,wn] ∈ R is orthogonal

n×n  Λ = diag[λ1,λ2,...,λn] ∈ R , |λ1| ≥ |λ2| ≥ ··· ≥ |λn| ≥ 0

SVD the singular value decomposition (SVD) of a matrix, A ∈ Rm×n, m ≥ n, is a factorization, A =

Uˆ ΣˆVˆ T , where

 Uˆ ∈ Rm×m is orthogonal

 Vˆ ∈ Rn×n is orthogonal, and

m×n  Σˆ = diag[σˆ1,...,σˆn] ∈ R , σˆ1 ≥ ... ≥ σˆn ≥ 0,

GSVD the generalized singular value decomposition (GSVD) of the matrix pair {A,B}, with A ∈ Rm×n

and B ∈ Rp×n, satisfying m ≥ n ≥ p, are factorizations of the form A = UΣX and B = VMX, where

m×m T  U ∈ R with U U = Im

p×p T  V ∈ R with V V = Ip

 X ∈ Rn×n is nonsingular

m×n  Σ = diag[σ1,···,σp,1,···,1] ∈ R

p×n  M = [diag[µ1,···, µp],0,···,0] ∈ R , and

2 2  σi + µi = 1, for 1 ≤ i ≤ p

xiii L the p × n regularization matrix

L1 Upper bidiagonal regularization matrix, the scaled finite difference approximations of the first

derivative operator with first row removed,

   1 −1         1 −1    (n−1)×n L1 =   ∈ R    .. ..   . .        1 −1

L2 Upper bidiagonal regularization matrix, the scaled finite difference approximations of the first

derivative operator with first row removed,

   −1 2 −1         −1 2 −1    (n−2)×n L2 =   ∈ R    ......   . . .        −1 2 −1

xiv CHAPTER 1

Introduction

1.1 Overview

We are concerned with the solution of large least-squares problems

(1.1.1) min kAx − bk, A ∈ Rm×n, b ∈ Rm, m ≥ n, x∈Rn with a matrix A, whose singular values gradually decay to zero without a significant gap. In particular, A is very ill-conditioned and may be rank-deficient. To simplify the notation, we will assume that m ≥ n, but this restriction can be removed. Least-squares problems with a matrix of this kind are commonly referred to as linear discrete ill-posed problems. They arise, for instance, from the discretization of linear ill-posed problems, such as Fredholm integral equations of the first kind with a continuous . The process of discretization is the transfer of continuous models and equations into discrete counterparts. It is used to derive an approximate problem with finitely many unknowns. The vector b in linear discrete ill-posed problems that arise in applications in science and engineering typically represents data that are contaminated by a measurement error e ∈ Rm. Sometimes we will refer to the vector e as “noise.” Thus,

(1.1.2) b = btrue + e,

m where btrue ∈ R represents the unknown error-free vector associated with the available vector b. We will assume that this “noise” vector e in (1.1.2) has normally distributed pseudorandom entries with mean zero and is normalized to correspond to a chosen noise level.

Let A† denote the Moore–Penrose pseudoinverse of A. We would like to determine an approximation of

1 † † † xtrue = A btrue by computing an approximate solution of (1.1.1). Note that the vector x = A b = xtrue + A e

† typically is a useless approximation of xtrue because the condition number of A, given by κ(A) = kAkkA k, is very large. Throughout this thesis k · k denotes the Euclidean vector norm or the spectral matrix norm.

† Generally, kA ek  kxtruek, so the value of x can be very far from that of xtrue. Due to the ill-conditioning of A, our goal is to reformulate the problem so that the new solution is less sensitive to perturbations. That is, we regularize the problem so that solution becomes more stable.

1.2 Regularization methods

The severe ill-conditioning of A makes the naive solution very sensitive to any perturbation of b. This is handled by regularization, i.e. replacing the system (1.1.1) with a nearby system that is less sensitive to the error e in b. We are able to recover a meaningful solutions by imposing smoothness on the computed solution. Several regularization methods have been developed over the years and they are very effective when utilized to solve linear discrete ill-posed problems. We will use two of the most common methods: truncated iterations (specifically the truncated singular value decomposition and the truncated eigenvalue decomposition) and Tikhonov regularization.

1.2.1 Truncated singular value decomposition (TSVD)

Suppose A ∈ Rm×n is the matrix in (1.1.1), then the singular value decomposition (SVD) is the factoriza- tion,

(1.2.1) A = Uˆ ΣˆVˆ T

m×m n×n where Uˆ = [uˆ1,···,uˆm] ∈ R and Vˆ = [vˆ1,···,vˆn] ∈ R are orthogonal matrices; and Σˆ = diag[σˆ1,σˆ2,

m×n ...,σˆn] ∈ R , σˆ1 ≥ σˆ2 ≥ ··· ≥ σˆr > σˆr+1 = 0 = ··· = σˆn = 0, where r = rank(A). We call σˆi the singular values, while uˆi and vˆi are the left and right singular vectors respectively. Problems whose singular values decay quickly are referred to as severely ill-posed.

2 The truncated SVD (TSVD) regularization method solves (1.1.1) by replacing A with the closest rank-k approximation Ak to A.

k (1.2.2) Ak = ∑ uˆiσˆivˆi, k ≤ r = rank(A). i=1

† ˆ † T We express the Moore-Penrose pseudoinverse of Ak, as Ak = Vˆ ΣkUˆ or

k −1 T (1.2.3) ∑ vˆiσˆi uˆi i=1

When A in (1.1.1) is replaced by Ak, we obtain a new least squares problem:

min kAkx − bk. x∈Rn

† The solution to this problem is given by xk = Akb, and we can express it as

k T uˆi b (1.2.4) xk = ∑ vˆi, k ≤ r. i=1 σˆi

This is referred to as the truncated SVD (TSVD) solution. The truncation parameter, k, in (1.2.4) is deter-

T mined by the index i, where the coefficients |uˆi b|, begin to level off due to the noise. Since the singular values gradually decay to zero, the small singular values lead to difficulties. Several of the smallest, nonva- nishing singular values in problems of interest to us are tiny. The TSVD reduces the influence of the noise by omitting the right singular vectors corresponding to these tiny singular values. The computed examples reported in this work will use the discrepancy principle to determine the regularization parameter, k. This will be discussed in Section 1.2.4.

3 1.2.2 Truncated eigenvalue decomposition (TEVD)

When the matrix A ∈ Rn×n is symmetric, it suffices to compute a few of its eigenvalues of largest magnitude and associated eigenvectors. We refer to pairs consisting of the eigenvalues of largest magnitude and asso- ciated eigenvectors of A as eigenpairs of largest magnitude of A. In these situations, the TSVD simplifies to the truncated eigenvalue decomposition.

We introduce the eigenvalue (or spectral) decomposition

(1.2.5) A = W ΛW T ,

n×n where the matrix W = [w1,w2,...,wn] ∈ R has orthonormal columns, and

n×n Λ = diag[λ1,λ2,...,λn] ∈ R .

The eigenvalues λi are assumed to be ordered according to

(1.2.6) |λ1| ≥ |λ2| ≥ ... ≥ |λn|.

Thus, the magnitude of the eigenvalues are the singular values of A, and the columns of the matrix W, with appropriate sign, are the associated right and left singular vectors.

We define the truncated eigenvalue decomposition (TEVD)

T (1.2.7) Ak = WkΛkWk ,

n×k where Wk = [w1,w2,...,wk] ∈ R and

k×k Λk = diag[λ1,λ2,...,λk] ∈ R

4 for some 1 ≤ k ≤ n. Thus, Ak is the best rank-k approximation of A in the spectral norm.

Replacing A by Ak in (1.1.1) for a suitable (small) value of k, and solving the reduced problem so obtained,

† often gives a better approximation of xtrue than A b. Thus, substituting (1.2.7) into (1.1.1) and replacing b

T T by WkWk b (i.e., by the orthogonal projection of b onto the range of Wk) and setting y = Wk x, yields the minimization problem

T min kΛky −Wk bk. y∈Rk

−1 T Assuming that λk > 0, its solution is given by yk = Λk Wk b, which yields the approximate solution

† −1 T xk = Akb = WkΛk Wk b = Wkyk of (1.1.1). This approach of computing an approximate solution of (1.1.1) is known as the TEVD method.

It is analogous to the TSVD method for nonsymmetric problems.

1.2.3 Tikhonov regularization

A widely used method for solving discrete ill-posed problems is the regularization method due to Tikhonov

[69]. The general form solves (1.1.1) by replacing it with a penalized least squares problem

(1.2.8) minkAx − bk2 + λkLxk2 , x where A ∈ Rm×n and L ∈ Rp×n, satisfying m ≥ n ≥ p ≥ 1. The matrix L is called the regularization ma- trix. kAx − bk measures goodness-of-fit, as its size determines how the regularized solution fits the initial problem. The quantity kLxk measures the regularity of the solution. We assume that L is such that

N (A) ∩ N (L) = {0}.

λ ≥ 0 is called the regularization parameter and it determines how sensitive the solution of the regularized

5 system (1.2.8) is to the error, e.

The Tikhonov problem has two alternative formulations, with the normal equation

(1.2.9) (AT A + λLT L)x = AT b

and how to solve it stably

    A b (1.2.10) min √ x − . x λL 0

The Tikhonov minimization problem (1.2.9) is said to be in general form. When the regularization matrix,

L = In, then it is in standard form

T T (1.2.11) (A A + λIn)x = A b.

The Tikhonov solution in standard form is then given by

 2 2 (1.2.12) xλ = min kAx − bk + λkxk . x

We can rewrite the regularized Tikhonov solution xλ in terms of the SVD of A. We do so by substituting the

T SVD into (1.2.11) and using In = VˆVˆ to obtain

2 T T (Σˆ + λIn)Vˆ xλ = ΣˆUˆ b.

2 Given that Σˆ + λIn is nonsingular, then

2 −1 T (1.2.13) xλ = Vˆ (Σˆ + λIn) ΣˆUˆ b

6 or

n T [λ ] uˆi b (1.2.14) xλ = ∑ φi vˆi, i=1 σˆi

2 [λ ] σˆi where φi = 2 is the standard form Tikhonov filter factor. This function dampens components in the σˆi +λ solution that correspond to the small singular values.

1.2.4 Regularization parameter: the discrepancy principle

We will now address how to find a dependable and automated method for choosing the regularization param- eter, such as k (for truncated iterations) or λ (for Tikhonov regularization). There are several techniques for choosing this parameter and they include: the discrepancy principle, generalized cross validation (GCV), the

L-curve criterion, and the normalized cumulative periodogram (NCP) method; see [23, 25, 26, 40, 45, 61] for discussions of these and other methods for choosing an appropriate regularization parameter. The reg- ularization parameter used throughout this work is the discrepancy principle, which was first discussed by

Morozov in [54]. It requires that a bound for the error e in b be known a priori

kek ≤ ε.

We will apply the discrepancy principle regularization method as follows:

Truncated iterations

For the TSVD, we find the smallest integer k ≥ 0 such that

(1.2.15) kAxk − bk ≤ τε,

where τ ≥ 1 is a user-chosen constant independent of ε. The more accurate the estimate of our available error bound, the closer we can choose τ to 1. Ideally, we would like to choose k such that kAxk − bk = τε

7 but this is rarely satisfied in practice.

The same approach is true for TEVD. We remark that one can compute kAxk − bk without evaluating a matrix-vector product with A by observing that

T kAxk − bk = kb −WkWk bk.

It can be shown that xk → xtrue as kek → 0; see, e.g., [23] for a proof in a Hilbert space setting. The proof is for the situation when A is nonsymmetric, i.e., for the truncated singular value decomposition.

Tikhonov regularization

Given that we have a bound ε for the norm of the error vector e, we seek λ so that the residual norm is equal to this value. To accomplish this, we solve the following nonlinear equation in terms of λ,

2 2 2 kAxλ − bk = τ kek

by Newton’s method for instance.

1.3 Krylov subspace methods

Linear discrete ill-posed problems like (1.1.1) are commonly solved with the aid of the singular value de- composition (SVD) of A, if it is a small matrix; see, e.g., [40, 56] and references therein. However, it is expensive to compute the SVD of a general large matrix; the computation of the SVD of an n × n matrix requires about 22n3 arithmetic floating-point operations (flops). See, e.g., [35, Chapter 8] for details as well as for flop counts for the situation when m > n. In particular, the SVD of a large general m × n matrix is very expensive to compute. Therefore, large-scale linear discrete ill-posed problems (1.1.1) are sometimes solved by hybrid methods that first reduce a large least-squares problem to a least-squares problem of small size by a Krylov subspace method, and then solve the latter by using the SVD of the reduced matrix so

8 obtained.

Given the matrix A ∈ Rn×n and the vector b ∈ Rn, the Krylov subspace generated by A and b is defined by

2 `−1 (1.3.1) K`(A,b) = span{b,Ab,A b,...,A b}, ` ≥ 1.

A Krylov method seeks an approximate solution to (1.1.1) in the space (1.3.1). Krylov subspace methods deal with matrix-vector products of A and not directly with A. As a result, they are very effective when A is very large and sparse. To construct a Krylov sequence, begin with the initial vector, b. We then multiply by A to get the next vector, Ab. This is followed by multiplying that vector by A to get the next vector, A2b, and so on. Hence, the matrix A2 is not explicitly formed, but the matrix-vector product A2b is evaluated as A(Ab), etc. These vectors are not orthogonal and for relatively small values of ` may become nearly linearly de- pendent. We would like to determine an orthonormal basis for a Krylov subspace, as orthonormal bases are easiest to work with. A few well-known Krylov subspace methods generate orthonormal bases. These meth- ods include the Arnoldi method, the Lanczos method and the Golub–Kahan decomposition method.

1.3.1 The Arnoldi method

The Arnoldi method [2] is a widely used Krylov subspace method. It builds an orthonormal basis of the

Krylov subspace K`+1(A,b) for general square, non-symmetric matrices. It is summarized in Algorithm 1.

Application of ` steps of Algorithm 1 yields the Arnoldi decomposition

(1.3.2) AV` = V`+1H`+1,`

n×(`+1) where the matrix V`+1 = [v1,v2,···,v`+1] ∈ R has orthonormal columns such that v1 = b/kbk, and span{v1,v2,···,v`+1} = K`+1(A,b). Also note that V` consists of the first ` columns

9 Algorithm 1 The Arnoldi Process 1: Input: A, b 6= 0, ` 2: Initialize: v1 = b/kbk 3: for j = 1,2,...,` do 4: w = Av j 5: for i = 1,···, j do 6: hi, j = hw,vii 7: w = w − hi, jvi 8: end for 9: h j+1, j = kwk 10: if h j+1, j = 0 then Stop

11: v j+1 = w/h j+1, j 12: end for 13: end

of V`+1. Furthermore, the matrix H`+1,` is an upper

   h1,1 ··· h1,`         h h   2,1 2,2       . . .  (`+1)×` (1.3.3) H`+1,` =  .. .. .  ∈ R .          h`,`−1 h`,`        h`+1,`

We denote the leading ` × ` submatrix of H`+1,` by H` . The vector w in Algorithm 1 is obtained by multiplying the previous Arnoldi vector, v j, by A. It is then orthonormalized against all previous Arnoldi vectors, vi, by the modified Gram-Schmidt iteration. The algorithm terminates in line 10, when h j+1, j = 0.

This situation implies that w ∈ span{v1,v2,···,v j}. As such, span{v1,v2,···,v j} is an invariant subspace of

A, simplifying (1.3.2) to AVj = VjHj.

10 The eigenvalues of Hj, where

T j× j (1.3.4) Hj = Vj AVj ∈ R

are called the Ritz values for A. The Ritz values are typically good approximations to the extreme eigenvalues of A, especially when A is symmetric.

1.3.2 The symmetric Lanczos process

The symmetric Lanczos process [47] is a special case of the Arnoldi process when A is symmetric. It has some very nice properties that include significant computational savings. When A is real and symmetric,

T T T then H` = (V` AV`) = H` is also symmetric, hence it is tridiagonal. Since hi, j = 0 for i < j − 1, we then introduce a new notation for the :

α j = h j, j and β j = h j+1, j = h j, j+1, j = 1,···,`.

Then (1.3.3) becomes the tridiagonal matrix

   α1 β2         β α β   2 2 3       ..   β3 α3 .       . .  (`+1)×` (1.3.5) T`+1,` =  .. ..  ∈ R .  β`−1         β α β   `−1 `−1 `         β` α`        β`+1

Consequently, Algorithm 1 take the form:

11 Algorithm 2 The Symmetric Lanczos Process 1: Input: A, b 6= 0, ` 2: Initialize: v1 = b/kbk, β1 = 0, v0 = 0 3: for j = 1,2,...,` do 4: y = Av j − β jv j−1 5: α j = hy,v ji 6: y = y − α jv j 7: β j+1 = kyk 8: if β j+1 = 0 then Stop

9: v j+1 = y/β j+1 10: end

The Lanczos vectors, v j, generated by the algorithm are orthonormal; and we define the matrix V`+1 = [v1,

n×(`+1) v2,...,v`+1] ∈ R . A matrix interpretation of the recursion relations of Algorithm 2 gives the (partial)

Lanczos decomposition

(1.3.6) AV` = V`+1T`+1,`.

It follows from the recursions of the Lanczos method that the columns v j of V`+1 can be expressed as

v j = q j−1(A)b, j = 1,2,...,` + 1

` where q j−1 is a polynomial of exact degree j − 1. Consequently, {v j} j=1 is an orthonormal basis for the

Krylov subspace (1.3.1). In exact arithmetic, the v j’s are orthogonal. However, upon implementation, they quickly lose their orthogonality. We use reorthogonalization schemes in our computed examples to circumvent this issue.

We finally comment on the situation when Algorithm 2 breaks down. This happens when some coefficient

β j+1 vanishes. We then have determined an invariant subspace of A. If this subspace contains all the desired eigenvectors, then we compute an approximation of xtrue in this subspace. Otherwise, we restart Algorithm

2 with an initial vector that is orthogonal to the invariant subspace already found. Since the occurrence of breakdown is rare, we will not dwell on this situation. See, e.g., Saad [64] for a thorough discussion on the

12 properties of Algorithm 2.

1.3.3 Golub-Kahan bidiagonalization

A large nonsymmetric matrix A ∈ Rm×n can be reduced to a small by applying a few steps of the Golub–Kahan bidiagonalization (also known as the Lanczos bidiagonalization algorithm). This is described by Algorithm 3.

Algorithm 3 Golub–Kahan Bidiagonalization 1: Input: A, b 6= 0, ` T 2: Initialize: β1 = kbk, p1 = b/β1, q = A p1, α1 = kqk, q1 = q/α1 3: for j = 2,...,` + 1 do 4: p = Aq j−1 − α j−1 p j−1 5: β j = kpk 6: if β j = 0 then Stop

7: p j = p/β j T 8: q = A p j − β jq j−1 9: α j = kqk 10: if α j = 0 then Stop

11: q j = q/α j 12: end

Using the vectors p j and q j determined by Algorithm 3, we define the matrices P`+1 = [p1,..., p`, p`+1] ∈

m×(`+1) n×(`+1) R and Q`+1 = [q1,...,q`,q`+1] ∈ R with orthonormal columns and P` consists of the first `

13 T T columns of P`+1. These vectors form orthonormal bases for the Krylov subspaces K`(AA ,b) and K`(A A,

T A b), respectively. The scalars α j and β j computed by the algorithm define the lower bidiagonal ma- trix

   α1         β α   2 2         β3 α3       . .  (`+1)×` (1.3.7) C¯` =  .. ..  ∈ R .          β α   `−1 `−1         β` α`        β`+1

A matrix interpretation of the recursions of Algorithm 3 gives the Golub–Kahan decompositions

T T (1.3.8) AQ` = P`+1C¯`, A P` = Q`C` ,

where the leading ` × ` submatrix of C¯` is denoted by C`. We assume ` is chosen small enough so that the decompositions (1.3.8) with the stated properties exist. See [12] for a recent discussion of this decomposi- tion.

If we combine the Golub–Kahan decompositions (1.3.8), we get

T T (1.3.9) A AQ` = Q`+1C¯` C¯`

T where C¯` C¯` is a symmetric tridiagonal matrix. Observe that this decomposition is equivalent to applying the Lanczos process (1.3.6) to the symmetric positive semidefinite matrix AT A.

14 1.3.4 Block Krylov methods

There are situations where (1.1.1) is a least squares problem with the vector b replaced by a matrix B ∈

Rm×s, 1 ≤ s  m, see e.g. in [1, 19, 22, 43, 51, 63]. In some of these cases, it is beneficial to use block generalizations of Krylov subspace methods. Then the matrix, A ∈ Rn×n, operates on a group of vectors instead of a single vector. We proceed by discussing an extension of the Arnoldi algorithm – the block

Arnoldi algorithm. It is described in Algorithm 4, an adaptation of Algorithm 1.

Algorithm 4 Block Arnoldi Algorithm 1: Input: A ∈ Rn×n, B ∈ Rn×s, `, and the block size s, where 1 ≤ s  n 2: Compute the QR decomposition B = V1H1,1 3: for j = 1,2,...,` do 4: Wj = AVj 5: for i = 1,···, j do T 6: Hi, j = Vi Wj 7: Wj = Wj −ViHi, j 8: end for 9: Compute the QR decomposition Wj = Vj+1Hj+1, j 10: end for 11: end

n×s 0 The blocks, Vi ∈ R , have orthonormal columns. Furthermore, the Vi s are mutually orthogonal and they

n×`s form the matrix V¯` = [V1,···,V`] ∈ R . This is an orthogonal basis for the block Krylov subspace

2 `−1 (1.3.10) K`(A,V1) = span{V1,AV1,A V1,...,A V1}, ` ≥ 1.

Consequently, a matrix interpretation of the recursion relations of Algorithm 4 gives the block Arnoldi decomposition

T AV¯` = V¯`H¯` +V`+1H`+1,`E` ,

`s×`s where H¯` ∈ R is no longer upper Hessenberg, but block upper Hessenberg (an upper

s×s with s subdiagonals); whose nonzero block entries are the upper triangular blocks Hi, j ∈ R for 1 ≤ i, j ≤ `

`s×`s and Hi, j ≡ 0 when i > j + 1. E` is a matrix of the last s columns of the identity matrix, I`s ∈ R .

15 1.4 The test problems

Most MATLAB codes for determining the discrete ill-posed problems in the computed examples of this

thesis stem from Regularization Tools by Hansen [41]. These linear systems were obtained by discretizing

Fredholm integral equations of the first kind. We assume that the system matrix A ∈ Rn×n as well as the exact

n solution xtrue ∈ R are available. If not accessible, the discrete right-hand side is obtained by computing

btrue = Axtrue.

1.4.1 Descriptions of the test problems

baart: The Fredholm integral equation of the first kind

Z π sin(s) π (1.4.1) exp(scos(t))x(t)dt = 2 , 0 ≤ s ≤ , 0 s 2

is discussed by Baart [3]. It has the solution x(t) = sin(t). The integral equation is discretized by a

Galerkin method with piece-wise constant test and trial functions using the function baart from [41].

This gives a nonsymmetric matrix. deriv2: The Fredholm integral equation of the first kind

Z 1 (1.4.2) K(s,t)x(t)dt = g(s), 0 ≤ s,t ≤ 1, 0

where the kernel K is Green’s function for the second derivative

  s(t − 1), s < t, K(s,t) =  t(s − 1), s ≥ t.

The right-hand side is given by g(s) = (s3 −s)/6 and the solution is x(t) = t. The integral equation is

discretized by a Galerkin method using the MATLAB function baart from [41]. The matrix produced

16 is symmetric and negative definite. This problem is mildly ill-conditioned, i.e., its singular values

decay slowly to zero. foxgood: This is the Fredholm integral equation of the first kind

Z 1 1 3 2 2 2 1 2 3 (1.4.3) s +t x(t)dt = (1 + s ) 2 − s , 0 ≤ s,t ≤ 1, 0 3

with solution x(t) = t, originally discussed by Fox and Goodwin [27]. The function foxgood from [41]

is used to determine a discretization by a Nyström method. This gives a symmetric indefinite matrix

that is severely ill-posed and numerically singular.

gravity: A one-dimensional gravity surveying model problem resulting in the first-kind Fredholm integral

equation

− 3 Z 1 1  1  2 (1.4.4) + (s −t)2 x(t)dt = g(s), 0 ≤ s,t ≤ 1, 0 4 16

and 1 x(t) = sin(πt) + sin(2πt). 2

Discretization is carried out by a Nyström method based on the midpoint quadrature rule using the

function gravity from [41]. The resulting matrix is symmetric positive definite and the exact right-

hand side is computed as btrue = Axtrue.

heat: The inverse heat equation [18] used in this thesis is a Volterra integral equation of the first kind. The

kernel is given by K(s,t) = k(s −t), where

1  1  k(t) = √ exp − . 2t3/2 π 4t

The discretization of the integral equation is done by simple collocation and the midpoint rule with n

points. The matrix produced is a lower-triangular matrix and it is ill-conditioned. An exact solution

17 is constructed and then the discrete right-hand side is computed as btrue = Axtrue. This is a severely

ill-posed problem. i_laplace: The Fredholm integral equation of the first kind

Z ∞ 16 (1.4.5) exp(−st)x(t)dt = 3 , s ≥ 0, t ≥ 0, 0 (2s + 1)

2 t  represents the inverse Laplace transform, with the solution x(t) = t exp − 2 . It is discretized by

means of Gauss–Laguerre quadrature using the MATLAB function i_laplace from [41]. The non-

so obtained is numerically singular.

phillips: We now consider the Fredholm integral equation of the first kind discussed by Phillips [60],

Z 6 (1.4.6) K(s,t)x(t)dt = g(t), −6 ≤ s,t ≤ 6, −6

where the solution x(t), kernel K(s,t), and right-hand side g(s) are given by

   1 + cos πt , |t| < 3,  3 x(t) =    0, |t| ≥ 3, K(s,t) = x(s −t),  1 πs 9 π|s| g(s) = (6 − |s|) 1 + cos + sin . 2 3 2π 3

The integral equation is discretized by a Galerkin method using the MATLAB function phillips from

[41]. The matrix produced is symmetric and indefinite.

shaw: Fredholm integral equation of the first kind discussed by Shaw [66],

π Z 2 π π (1.4.7) K(s,t)x(t)dt = g(s), − ≤ s,t ≤ , π − 2 2 2

18 with kernel sin(π(sin(s) + sin(t)))2 K(s,t) = (cos(s) + cos(t))2 π(sin(s) + sin(t)) and solution

x(t) = 2exp(−6(t − 0.8)2) + exp(−2(t + 0.5)2), which define the right-hand side function g. Discretization is carried out by a Nyström method based on the midpoint quadrature rule using the function shaw from [41]. The resulting matrix is symmetric indefinite and numerically singular. The discrete right-hand side is computed as btrue = Axtrue. This problem is severely ill-posed.

19 CHAPTER 2

Reduction methods applied to discrete ill-posed problems

2.1 Introduction

Consider (1.1.1) with a large symmetric matrix, A ∈ Rn×n. Many solution methods for such large-scale problems first reduce the system of equations to a problem of small size. The symmetric Lanczos method discussed in Section 1.3.2 is a popular reduction method. Application of ` steps of Algorithm 2 to A with initial vector b yields a decomposition of the form (1.3.6).

The Lanczos method determines the diagonal and subdiagonal elements

α1,β2,α2,β3,...,α`,β`+1

of T`+1,` in order. Generically, the subdiagonal entries β j are positive, and then the decomposition (1.3.6) with the stated properties exists.

The solution of (1.1.1) by truncated iteration proceeds by solving

(2.1.1) min kAx − bk = min kT`+1,`y − e1kbkk, ` x∈K`(A,b) y∈R where the right-hand side is obtained by substituting the decomposition (1.3.6) into the left-hand side and by exploiting the properties of the matrices involved. Here and in the following, e j denotes the jth axis vector. Let y` be a solution of the least squares problem on the right-hand side of (2.1.1). Then x` := V`y` is a solution of the constrained least squares problem on the left-hand side of (2.1.1), as well as an approximate solution of (1.1.1). By choosing ` suitably small, propagation of the error e in b into the computed solution

† x` is reduced. This depends on that the condition number of T`+1,`, given by κ(T`+1,`) := kT`+1,`kkT`+1,`k,

20 is an increasing function of `. A large condition number indicates that the solution y` of the right-hand side of (2.1.1) is sensitive to errors in the data and to round-off errors introduced during the computations. We will discuss this and other solution methods below. For overviews and analyses of solution methods for linear discrete ill-posed problems, we refer to [23, 40].

We will investigate the structure of the matrix (1.3.5) obtained by applying the Lanczos method to a sym- metric matrix whose eigenvalues “cluster” at the origin. We will give upper bounds for the size of the subdiagonal entries. These bounds shed light on the solution subspaces generated by the symmetric Lanc- zos method. In particular, the bounds indicate that the ranges of the matrices V` essentially contain the span of the k = k(`) eigenvectors of A associated with the k eigenvalues of largest magnitude where k(`) is an increasing function of ` and, generally, k(`) < `. This observation suggests that it may not be necessary to compute a partial eigendecomposition of A, but that it suffices to determine a few Lanczos vectors, which is much cheaper. We also will investigate the solution subspaces determined by application of ` steps of

Golub–Kahan bidiagonalization to a nonsymmetric matrix A, whose singular values cluster at the origin.

We find the solution subspaces determined by ` steps of Golub–Kahan bidiagonalization applied to A to essentially contain the spans of the k = k(`) right and left singular vectors of A associated with the k largest singular values, where k = k(`) is an increasing function of ` and, generally, k(`) < `. This suggests that it may not be necessary to compute singular value or partial singular value decompositions of A, but that it suf-

fices to carry out a few steps of Golub–Kahan bidiagonalization, which is much cheaper. The results for the spans of the solution subspaces determined by partial Golub–Kahan bidiagonalization follow from bounds for singular values. These bounds provide an alternative to the bounds shown by Gazzola et al. [30, 31].

Related bounds also are presented by Novati and Russo [56].

2.2 Application of the symmetric Lanczos method

This section discusses the convergence of the subdiagonal and diagonal entries of the matrix T`+1,` in (1.3.6) with increasing dimensions. The proofs use the spectral factorization (1.2.5).

21 Theorem 2.2.1. Let the matrix A ∈ Rn×n be symmetric and positive semidefinite, and let its eigenvalues be ordered according to (1.2.6). Assume that the Lanczos method applied to A with initial vector b does not break down, i.e., that n steps of the method can be carried out. Let β2,β3,...,β`+1 be the subdiagonal entries of the matrix T`+1,` determined by ` steps of the Lanczos method; cf. (1.3.6). Define βn+1 := 0. Then

`+1 ` (2.2.1) ∏ β j ≤ ∏ λ j, ` = 1,2,...,n. j=2 j=1

` Proof. Introduce the monic polynomial p`(t) = ∏ j=1(t − λ j) defined by the ` largest eigenvalues of A.

Using the spectral factorization (1.2.5), we obtain

` kp`(A)k = kp`(Λ)k = max |p`(λ j)| ≤ |p`(0)| = ∏ λ j, `+1≤ j≤n j=1

where the inequality follows from the fact that all λ j are nonnegative. Hence,

` (2.2.2) kp`(A)bk ≤ kbk ∏ λ j. j=1

Application of n steps of the symmetric Lanczos method gives the decomposition AVn = VnTn, where Tn ∈

n×n n×n R is symmetric and tridiagonal, and Vn ∈ R is orthogonal with Vne1 = b/kbk. We have

T (2.2.3) p`(A)b = Vn p`(Tn)Vn b = Vn p`(Tn)e1kbk.

This relation gives the equality below,

`+1 (2.2.4) kp`(A)bk = kp`(Tn)e1kkbk ≥ kbk ∏ β j. j=2

22 The inequality above follows by direct computation. Specifically, one can show by induction on ` that

`+1 T kp`(Tn)e1k ≥ |e`+1 p`(Tn)e1| = ∏ β j. j=2

For ` = 1 the result is trivial. Assume that it is valid for 1 ≤ ` < n. Then

T T T e`+2 p`+1(Tn)e1 = e`+2(Tn − λ`+1I)p`(Tn)e1 = φ`+2 p`(Tn)e1, with

φ`+2 = β`+2e`+1 + (α`+2 − λ`+1)e`+2 + β`+3e`+3.

Since p`(Tn) is (2` + 1)-banded, we obtain

`+1 `+2 T T φ`+2 p`(Tn)e1 = β`+2e`+1 p`(Tn)e1 = β`+2 ∏ β j = ∏ β j. j=2 j=2

Combining (2.2.2) and (2.2.4) shows the theorem.

In practice, the bound (2.2.1) is often quite sharp; we will give a numerical illustration of this in Section 2.4.

Moreover, we can easily derive bounds of the form

j−1 ∏i=1 λi (2.2.5) β j+1 ≤ k jλ j, j = 1,...,n − 1, with k j := j−1 ≥ 1. ∏i=1 βi+1

However, if the bound (2.2.1) is not sharp, then k j  1, resulting in a meaningless estimate (2.2.5). A result analogous to Theorem 2.2.1 for nonsymmetric matrices A, with the Lanczos method replaced by the Arnoldi method, has been shown by Novati and Russo [56] and Gazzola et al. [30, 31]. It should be emphasized that the bounds proved in [30, 31, 56] are similar to (2.2.5), but assume btrue as starting vector for the Arnoldi algorithm, involve constants whose values are not explicitly known, and are only valid for moderately to severely ill-posed problems; see, e.g., Hansen [40] for this classification of ill-posed

23 problems. The restriction to symmetric matrices allows us to give a bound which does not explicitly depend on the starting vector of the Lanczos method.

Corollary 2.2.2. Let A ∈ Rn×n be symmetric and positive semidefinite. Assume that the eigenvalues of A

“cluster” at the origin and that the Lanczos method applied to A with initial vector b does not break down.

Also assume that for all j > s, β j ≤ C min βi, for a constant C independent of j and s. Then both the 1≤i≤s diagonal and subdiagonal entries α j and β j of the tridiagonal Lanczos matrix T`+1,`, cf. (1.3.6), approach zero when j increases and is large enough.

Proof. We first remark that when we let the index j increase, we also may have to increase ` in (1.3.6). The fact that the β j’s approach zero as j increases follows from (2.2.1) and the clustering of the eigenvalues at zero. We turn to the diagonal entries α j of the tridiagonal Lanczos matrix Tn. This matrix is similar to A.

Therefore its eigenvalues cluster at the origin, which is the only cluster point. Application of Gershgorin disks to the rows of Tn, using the fact that the off-diagonal entries are “tiny” for large row numbers, shows that the corresponding diagonal entries are accurate approximations of the eigenvalues of Tn. These entries therefore have to approach zero as j increases.

We remark that the decrease of the subdiagonal entries β j of T`+1,` to zero with increasing j follows from the clustering of the eigenvalues of A; it is not necessary that they cluster at the origin. This can be seen by replacing the matrix A in Corollary 2.2.2 by A + cIn for some constant c ∈ R. This substitution adds c to the diagonal entries of T`+1,` in (1.3.6), and it is employed with a suitable parameter c > 0 to perform

Lavrentiev-type regularization; cf. [40].

The assumption in Theorem 2.2.1 that n steps of the Lanczos method can be carried out simplifies the proof, but is not essential. We state the corresponding result when the Lanczos method breaks down at step k < n.

Corollary 2.2.3. Let the matrix A ∈ Rn×n be symmetric and positive semidefinite, and let its eigenvalues be ordered according to (1.2.6). Assume that the Lanczos method applied to A with initial vector b breaks

24 down at step k. Then `+1 ` ∏ β j ≤ ∏ λ j, ` = 1,2,...,k, j=2 j=1 where βk+1 := 0.

Proof. Application of k steps of the Lanczos method to the matrix A with initial vector b gives the decom- position

AVk = VkTk,

n×k where Vk = [v1,v2,...,vk] ∈ R has orthonormal columns with Vke1 = b/kbk and

 

 α1 β2           β2 α2 β3       ..   β3 α3 .    k×k Tk =   ∈ R    .. ..   . . βk−1           βk−1 αk−1 βk        βk αk is symmetric and tridiagonal with positive subdiagonal entries. The inequality (2.2.2) holds for ` = 1,2,..., n, however, the relation (2.2.3) has to be replaced by

(2.2.6) p`(A)b = Vk p`(Tk)e1kbk, 1 ≤ ` < k.

This relation can be shown by induction on `. Indeed, for ` = 1, one immediately has

p1(A)b = (A − λ1I)Vke1kbk = (AVk − λ1Vk)e1kbk = Vk(Tk − λ1I)e1kbk = Vk p1(Tk)e1kbk.

25 Assuming that (2.2.6) holds for ` < k − 1, for ` + 1 one gets

p`+1(A)b = (A − λ`+1I)p`(A)b = (A − λ`+1I)Vk p`(Tk)e1kbk

= Vk(Tk − λ`+1I)p`(Tk)e1kbk = Vk p`+1(Tk)e1kbk.

Analogously to (2.2.4), we obtain

`+1 kp`(A)bk = kp`(Tk)e1kkbk ≥ kbk ∏ β j, j=2 and the corollary follows.

We turn to symmetric indefinite matrices. For notational simplicity, we will assume that the Lanczos method does not break down, but this requirement can be relaxed similarly as in Corollary 2.2.3.

n n×n Theorem 2.2.4. Let the eigenvalues {λ j} j=1 of the symmetric matrix A ∈ R be ordered according to

(1.2.6). Assume that the Lanczos method applied to A with initial vector b does not break down. Then

`+1 ` (2.2.7) ∏ β j ≤ ∏(|λ`+1| + |λ j|), ` = 1,2,...,n − 1. j=2 j=1

Proof. Let p`(t) be the monic polynomial of the proof of Theorem 2.2.1. Then just like in that proof

kp`(A)k = kp`(Λ)k = max |p`(λ j)|. `+1≤ j≤n

It follows from the ordering (1.2.6) of the eigenvalues that the interval [−|λ`+1|,|λ`+1|] contains all the eigenvalues λ`+1,λ`+2,...,λn. Therefore,

` max |p`(λ j)| ≤ max |p`(t)| ≤ ∏(|λ`+1| + |λ j|). `+1≤ j≤n −|λ`+1|≤t≤|λ`+1| j=1

The inequality (2.2.7) now follows similarly as the proof of the analogous inequality (2.2.1).

26 Assume that the eigenvalues of A cluster at the origin. Then Theorem 2.2.4 shows that the factors |λ`+1| +

|λ j| decrease to zero as ` and j, with 1 ≤ j ≤ `, increase. Furthermore, the more Lanczos steps are taken, the tighter the bound for the product of the subdiagonal elements of the matrix T`+1,`. Sharper bounds for the product of subdiagonal entries of T`+1,` can be obtained if more information about the spectrum of A is available. For instance, if all but a few eigenvalues of A are known to be nonnegative, then only the factors with the negative eigenvalues have to be modified as in Theorem 2.2.4, resulting in improved bounds for products of the β j. Simpler, but cruder, bounds than (2.2.7) also can be derived. The following is an example.

n n×n Corollary 2.2.5. Let the eigenvalues {λ j} j=1 of the symmetric matrix A ∈ R be ordered according to

(1.2.6). Assume that the Lanczos method applied to A with initial vector b does not break down. Then,

`+1 ` (2.2.8) ∏ β j ≤ ∏(2|λk|), ` = 1,2,...,n − 1. j=2 k=1

Proof. The result follows from the observation that |λ`+1| ≤ |λk| for 1 ≤ k ≤ `.

Introduce the set of ε-pseudoeigenvectors of A ∈ Rn×n:

n (2.2.9) Vε := {x ∈ R unit vector : ∃λ ∈ R such that kAx − λxk ≤ ε}.

The λ-values associated with ε-pseudoeigenvectors are ε-pseudoeigenvalues of A; see, e.g., Trefethen and

Embree [72] for an insightful treatment of pseudospectra of matrices and operators.

T Substituting the decomposition A = VnTnVn into (2.2.9) and applying Theorem 2.2.4 show that, for a given

ε > 0 and for j sufficiently large, the Lanczos vectors v j are ε-pseudoeigenvectors of A associated with eigenvalues close to zero. Indeed, by (1.3.6) we get

Av j = AV`e j = V`+1T`+1,`e j = α jv j + β jv j−1 + β j+1v j+1 .

27 Since as j increases α j and β j approach 0, we can conclude that the v j are ε-pseudoeigenvectors for j large.

Let w j denote the jth column of the matrix W in (1.2.5), i.e., let w j be the jth eigenvector of A. Therefore,

k ` the space span{w j} j=1 is essentially contained in span{v j} j=1 for k = k(`) ≤ ` sufficiently small. The notion of “essentially contained” will be made precise and illustrated in Section 2.4.

k ` The above observation about the subspaces span{w j} j=1 and span{v j} j=1 for k = k(`) has implications for computations. One of the most popular methods for solving linear discrete ill-posed problems is the trun- cated singular value decomposition (TSVD); see Section 1.2.1. The truncated eigenvalue decomposition for symmetric matrices is analogous to the TSVD; see Section 1.2.2. It is based on expressing an approximate

k solution of (1.1.1) as a linear combination of the first few eigenvectors, say {w j} j=1, of A; cf. (1.2.5). The

` computation of these eigenvectors is more expensive than the determination of the Lanczos vectors {v j} j=1 for a reasonable k = k(`) ≤ `, because typically several Lanczos decompositions with different initial vec- tors have to be computed in order to determine the desired eigenvectors; see, e.g., Baglama et al. [4] and

Saad [65] for discussions on methods for computing a few eigenpairs of a large matrix. Since the span of

` k the Lanczos vectors {v j} j=1 essentially contains the set of the eigenvectors {w j} j=1, there is generally no need to compute the latter. This is illustrated by numerical examples in Section 2.4.

It is sometimes beneficial to determine an approximate solution of (1.1.1) in a shifted Krylov subspace

2 ` (2.2.10) K`(A,Ab) = span{Ab,A b,...,A b}

instead of in the standard Krylov subspace (1.3.1). This is discussed and illustrated in [15, 20, 55]. Let

(2.2.11) AV˘` = V˘`+1T˘`+1,`,

n×(`+1) where V˘`+1 = [v˘1,v˘2,...,v˘`+1] ∈ R has orthonormal columns with v˘1 = Ab/kAbk, V˘` = [v˘1,v˘2,...,

n×` v˘`] ∈ R , and the tridiagonal matrix T˘`+1,` is of the same form as (1.3.5). Then, analogously to (2.1.1), we

28 formally obtain

2 T 2 T 2 min kAx − bk = min kT˘`+1,`y −V˘ bk + k(I −V˘`+1V˘ )bk . ` `+1 `+1 x∈K`(A,Ab) y∈R

` Let y˘` ∈ R denote the solution of the minimization problem on the right-hand side of the above relation.

Then x˘` := V˘`y˘` solves the constrained least squares problem on the left-hand side of the above equation and is an approximate solution of (1.1.1). Computed examples show the vector x˘` to generally approximate the desired solution xtrue more accurately than the solution x` of the minimization problem (2.1.1). In the computed examples of Section 2.4, we therefore compute the vectors x˘1,x˘2,... .

We note that the analysis in this section is independent of the initial vector b in the Krylov subspace (1.3.1), except for how this vector affects the occurrence of breakdown. In particular, our analysis carries over to shifted Krylov subspaces of the form (2.2.10).

2.3 Application of the Golub–Kahan reduction method

A nonsymmetric matrix A ∈ Rm×n can be reduced to a small bidiagonal matrix by a few steps of Golub–

Kahan bidiagonalization; see Section 1.3.3. This reduction method is the basis for the popular LSQR al- gorithm [59] for the solution of least-squares problems (1.1.1) where the vector b ∈ Rm can be written as

(1.1.2). Application of `  n steps of Golub–Kahan bidiagonalization to A with initial vector b gives the decompositions (1.3.8). Throughout this section, α j and β j refer to entries of the matrix (1.3.7).

The LSQR method applied to the solution of (1.1.1) solves in step ` the minimization problem

min kAx − bk = min kC¯`y − e1kbkk, T T ` x∈K`(A A,A b) y∈R where the right-hand side is obtained by substituting (1.3.8) into the left-hand side. Denote the solution of

0 0 0 the right-hand side by y`. Then the `th step of LSQR yields the solution xm = Q`ym of the left-hand side, which is an approximate solution of (1.1.1).

29 The decomposition (1.3.9) is a Lanczos decomposition, and it allows us to apply Theorem 2.2.1.

m×n Corollary 2.3.1. Let A ∈ R have the singular values σ1 ≥ σ2 ≥ ··· ≥ σn ≥ 0, and assume that the

Golub–Kahan bidiagonalization method applied to A with initial vector b does not break down. Then

`+1 ` 2 (2.3.1) ∏ α jβ j ≤ ∏ σ j , ` = 1,2,...,m − 1, j=2 j=1

where the α j and β j are entries of the bidiagonal matrix (1.3.7).

T T 2 Proof. The subdiagonal entries of the matrix C¯` C¯` in (1.3.9) are α jβ j and the eigenvalues of A A are σ j .

The result therefore follows from Theorem 2.2.1.

The above corollary shows that if the singular values σ j cluster at zero for large j, then so do the products

α jβ j of the entries of the matrix (1.3.7). Bounds related to (2.3.1) have been shown by Gazzola et al.

[31].

The computation of the SVD (1.2.1) is feasible for problems of small to moderate size, but expensive for large-scale problems. The computational expense for large-scale problems can be reduced by computing the the partial singular value decomposition of A, {Uˆk,Vˆk,Σˆ k}, instead of {Uˆ ,Vˆ ,Σˆ }; see [6, 8, 58] and references therein for suitable numerical methods. The computation of Uˆk, Vˆk, and Σˆ k generally requires that several

Golub–Kahan decompositions (1.3.8) with different initial vectors be evaluated.

Corollary 2.3.1 indicates why it may not be necessary to compute the matrices Uˆk, Vˆk, and Σˆ k. The columns

T 2 of the matrix Vˆ in (1.2.1) are eigenvectors of A A and the σˆ j are eigenvalues. It is a consequence of

Corollary 2.3.1, and the fact that the singular values σˆ j of A cluster at the origin, that the columns q j of

T the matrix Q` in (1.3.8) for small j are accurate approximations of eigenvectors of A A. This follows from an argument analogous to the discussion in Section 2.2 based on Corollaries 2.2.2 and 2.2.5. Therefore, it generally is not necessary to compute the partial singular value decomposition {Uˆk,Vˆk,Σˆ k} of A. Instead, it suffices to determine a partial Golub–Kahan bidiagonalization (1.3.8), which is cheaper. This is illustrated in the following section.

30 2.4 Computed examples

To investigate the properties discussed in the previous sections, we applied the symmetric Lanczos and

Golub–Kahan bidiagonalization methods to a set of test matrices whose singular values cluster at the origin.

The numerical experiments were carried out using MATLAB R2014a in double arithmetic, that is, with about 15 significant decimal digits.

The symmetric test matrices are listed in Table 1, while the nonsymmetric ones are in Table 2. Among the symmetric matrices, one example is negative definite (Deriv2), one is positive definite (Gravity), and the others are indefinite. All matrices except one are from the Regularization Tools package [41]. The

Lotkin test matrix was generated by the gallery function, which is available in the standard MATLAB distribution. All test matrices are of order 200 × 200 except when explicitly stated otherwise.

Figure 1: Behavior of the bounds (2.2.1) (left), (2.2.7) (center), and (2.3.1) (right), with respect to the iteration index `. The first test matrix is symmetric positive definite, the second is symmetric indefinite, and the third is unsymmetric. The left-hand side of each inequality is represented by crosses, the right-hand side by circles.

Figure 1 displays, in logarithmic scale, the values taken by each side of inequalities (2.2.1), (2.2.7), and

(2.3.1), with the number of iterations, `, ranging from 1 to the index corresponding either to a breakdown of the algorithm or to the last nonzero value of both inequality sides. The graphs show that the bounds provided by Theorems 2.2.1 and 2.2.4, and by Corollary 2.3.1, are quite sharp.

31 Now we illustrate that subspaces R(V˘k) generated by the Lanczos method (2.2.11) essentially contain sub- spaces of eigenvectors of A associated with the eigenvalues of largest magnitude. We also discuss the con- vergence of the largest eigenvalues of the matrices T˘k in (2.2.11) to eigenvalues of A of largest magnitude.

k×k (k+1)×k Here T˘k ∈ R is the matrix obtained by neglecting the last row of the matrix T˘k+1,k ∈ R defined by

(2.2.11) with ` replaced by k. The Lanczos method is applied for n steps or until breakdown occurs – that is,

−12 a subdiagonal element of T˘k is smaller than 10 ; ` denotes the number of steps performed by the method.

The initial column of the matrices V˘k is Abtrue/kAbtruek.

˘ (k) k ˘ (k) Let {λi }i=1 denote the eigenvalues of the matrix T˘k. We compare the eigenvalues λi of largest magnitude to the corresponding eigenvalues λi of the matrix A. All eigenvalues are ordered according to decreasing magnitude. For each Lanczos step k, we compute the relative difference

˘ (k) |λi − λi| (2.4.1) Rλ,k := max . k |λ | i=1,2,...,d 3 e i

k Thus, we evaluate the maximum relative difference over the d 3 e eigenvalues of largest modulus; dηe denotes the integer closest to η ∈ R. The graphs for Rλ,k, for k = 1,2,...,n, are displayed in the left column of

Figure 2 for each of the 5 symmetric test matrices.

˘ T We turn to a comparison of subspaces. For each k, let T˘k = W˘ kΛkW˘ k be the spectral factorization of T˘k, where

˘ ˘ (k) ˘ (k) ˘ (k) (k) (k) (k) Λk = diag[λ1 ,λ2 ,...,λk ], W˘ k = [w˘1 ,w˘2 ,...,w˘k ],

(k) (k) (k) and introduce the matrix Vk,i = [v1 ,v2 ,...,vi ] consisting of the first i columns of V˘kW˘ k. The columns ˘ (k) ˘ (k) ˘ (k) of Vk,i are the Ritz vectors of A associated with the i Ritz values of largest magnitude, λ1 ,λ2 ,...,λi .

(1) (2) Partition the matrix containing the eigenvectors of A, cf. (1.2.5), according to W = [Wi Wn−i], where

(1) n×i (2) n×(n−i) (1) Wi ∈ R contains the first i eigenvectors, and Wn−i ∈ R the remaining ones. The columns of Wi

(2) and Wn−i span orthogonal subspaces.

32 Figure 2: The graphs in the left column display the relative error Rλ,k between the eigenvalues of the symmetric test problems, and the corresponding Ritz values generated by the Lanczos process. The right column shows the behavior of Rσˆ ,k for the unsymmetric problems; see (2.4.1) and (2.4.3).

33 We compute, for k = 1,2,...,`, the quantities

T (2) (2.4.2) Rw,k := max kV W k. k k,i n−i i=1,2,...,d 3 e

T (2) (1) The norm kVk,iWn−ik measures the distance between the subspaces R(Vk,i) and R(Wi ); see, e.g., [35]. k d 3 e (k) k Thus, Rw,k is small when span{w j} j=1 is approximately contained in span{v j } j=1, that is, when the so-

k lution subspace generated by the Lanczos vectors essentially contains the space generated by the first d 3 e eigenvectors. The graphs in the left column of Figure 3 shows Rw,k, for k = 1,2,...,`, for the symmetric test

T (2) matrices. The distances between subspaces kVk,iWn−ik are displayed in Figure 4, for k = 10 and i = 1,...,k, for two symmetric test matrices.

A few comments on the left-hand side graphs of Figures 2 and 3 are in order. The left graphs of Figure

k 3 show that the span of the first d 3 e eigenvectors of A are numerically contained in the span of the first k

Lanczos vectors already for quite small values of k. We remark that this is not true if we compare the spaces spanned by the first k eigenvectors of A and by its first k Lanczos vectors. Graphs that compare the span of

k the first d 2 e eigenvectors of A with the span of the first k Lanczos vectors look similar to the graphs shown, but display slower convergence; see graphs in the left column of Figure 5. Thus, k has to be larger in order

k for the first d 2 e eigenvectors of A to be numerically in the span of the first k Lanczos vectors. Figure 2 shows

k excellent agreement between the first d 3 e Ritz values of A and the corresponding eigenvalues already for

k small k. The convergence of the first d 2 e Ritz values to the corresponding eigenvalues is somewhat slower than the convergence displayed.

We use the Lanczos decomposition (2.2.11) in our illustrations because this decomposition gives approx- imate solutions of (1.1.1) of higher quality than the decompositions (1.3.6). Analogues of Figures 2 and

3 based on the Lanczos decomposition (1.3.6) look essentially the same as the figures shown. Finally, we remark that the Lanczos decomposition (2.2.11) is computed with reorthogonalization. Without reorthogo- nalization the convergence illustrated by Figures 2 and 3 does not hold.

34 Figure 3: Distance between the subspace spanned by the first dk/3e eigenvectors (resp. singular vectors) of the symmetric (resp. nonsymmetric) test problems, and the subspace spanned by the corresponding Lanczos (resp. Golub–Kahan) vectors; see (2.4.2) and (2.4.4).

35 T (2) Figure 4: Distance kVk,iVn−ik, i = 1,2,...,k, between the subspace spanned by the first i eigenvectors of the Foxgood (left) and Shaw (right) matrices, and the subspace spanned by the corresponding i Ritz vectors at iteration k = 10.

We turn to nonsymmetric matrices A. The Lanczos method is replaced by the Golub–Kahan method (1.3.8) and the spectral factorization by the singular value decomposition (1.2.1). The index ` denotes either the

−12 order n of the matrix or the step at which an element of the bidiagonal projected matrix C¯` is less than 10 , i.e., the step at which a breakdown happens. The graphs in the right-hand side column of Figure 2 show the relative differences

(k) |σ˘i − σˆi| (2.4.3) Rσˆ ,k := max k |σ | i=1,2,...,d 3 e ˆi

(k) k ¯ between the singular values {σ˘i }i=1 of Ck and those of A. The graphs are similar to those in the left-hand side column, except for the example Tomo, which displays slow convergence: this behavior is probably linked to the fact that the Tomo test problem is much less ill-conditioned than the other test problems.

Let Uˆ and Vˆ be the orthogonal matrices in the singular value decomposition (1.2.1) of A, and partition

(1) (2) (1) (2) these matrices similarly as we did for symmetric matrices A, i.e., Uˆ = [Uˆi Uˆn−i] and Vˆ = [Vˆi Vˆn−i],

(1) (1) (2) (2) where the submatrices Uˆi ,Vˆi contain the first i singular vectors and Uˆn−i, Vˆn−i the remaining n − i ones. To investigate the convergence of subspaces, we substitute the singular value decomposition C¯k =

36 Figure 5: Distance between the subspace spanned by the first dk/2e eigenvectors (resp. singular vectors) of selected symmetric (resp. nonsymmetric) test problems and the subspace spanned by the corresponding Lanczos (resp. Golub–Kahan) vectors. The index ` ranges from 1 to either the dimension of the matrix (n = 200) or to the iteration where there is a breakdown in the factorization process.

˘ T U˘k+1Σk+1,kV˘k into (1.3.8) and consider

n T (2) T (2) o (2.4.4) R(u,v),k := max kV Vˆ k,kU Uˆ k , ˆ ˆ k k,i n−i k,i n−i i=1,2,...,d 3 e

where Vk,i and Uk,i are made up of the first i columns of QkV˘k and Pk+1U˘k+1, respectively. Then R(uˆ,vˆ),k measures the distance between subspaces determined by the singular vectors of A and those defined by vectors computed with the Golub–Kahan method.

The quantities R(uˆ,vˆ),k are displayed, for k = 1,2,...,`, in the right column of Figure 3. Figure 5 depicts graphs for the quantities Rvˆ,k and R(uˆ,vˆ),k with the maximum computed over the first dk/2e vectors for four

T (2) T (2) test problems. Figure 6 shows the value taken by max{kVk,iVˆn−ik,kUk,iUˆn−ik}, i = 1,2,...,100, which express the distance between spaces spanned by singular vectors and vectors determined by the Golub–

Kahan method.

We now compare the performances of different regularization methods. The test problems from [41] define both a matrix A and a solution xtrue; the solution of the Lotkin example is the same as for the Shaw example.

37 T (2) T (2) Figure 6: Distance max{kVk,iVn−ik,kUk,iUn−ik}, i = 1,2,...,k, between the subspace spanned by the first i singular vectors of the Heat (left) and Tomo (right) matrices and the subspace spanned by the corresponding i Golub–Kahan vectors at iteration k = 100.

Table 1: Solution of symmetric linear systems: the errors ELanczos and ETEIG are optimal for truncated Lanc- zos iteration and truncated eigenvalue decomposition. The corresponding truncation parameters are denoted by kLanczos and kTEIG. Three noise levels δ are considered; ` denotes the number of Lanczos iterations performed.

noise matrix ` ELanczos kLanczos ETEIG kTEIG δ = 10−6 Deriv2 200 2.0×10−2 49 2.1×10−2 199 Foxgood 24 6.8×10−4 6 6.3×10−4 6 Gravity 46 1.2×10−3 15 1.2×10−3 16 Phillips 200 5.8×10−4 22 5.8×10−4 32 Shaw 19 1.9×10−2 10 1.9×10−2 10

δ = 10−4 Deriv2 200 1.2×10−1 10 1.1×10−1 51 Foxgood 24 1.4×10−2 3 4.5×10−3 4 Gravity 45 1.5×10−2 7 6.3×10−3 12 Phillips 200 4.8×10−3 12 3.9×10−3 15 Shaw 19 4.7×10−2 7 3.4×10−2 9

δ = 10−2 Deriv2 200 3.1×10−1 3 2.3×10−1 12 Foxgood 24 7.7×10−2 2 2.9×10−2 2 Gravity 45 8.0×10−2 3 3.3×10−2 7 Phillips 200 4.3×10−2 6 2.2×10−2 8 Shaw 19 1.2×10−1 7 9.1×10−2 7

38 The error-free data vector is defined by btrue := Axtrue and the contaminated data vector is given by (1.1.2) with δ e := ekb k√ , b true n

n where the random vector eb∈ R models Gaussian noise with mean zero and variance one and δ is a chosen noise level. In our experiments we let δ = 10−6,10−4,10−2.

We measure the accuracy attainable by each regularization method by the relative error

kxkmethod − xtruek kxk − xtruek (2.4.5) Emethod = = min , kxtruek k=1,2,...,` kxtruek

which is obtained by choosing the value k = kmethod that minimizes the error for the method under consider- ation.

Table 1 reports the result obtained by comparing truncated Lanczos iteration (2.1.1) to truncated eigenvalue solution for symmetric test problems. The minimal errors (2.4.5) obtained by applying the Lanczos method and the truncated eigenvalue decomposition method, denoted by ELanczos and ETEIG, are reported in the fourth and sixth columns, respectively. The truncation parameters which produce the minimal errors are listed in the fifth and seventh columns. The third column shows how many Lanczos iterations were executed; an entry smaller than 200 indicates that breakdown occurred. Both errors and truncation parameters values are averages over 20 realization of the random noise. Three noise levels δ were considered. The results in Table 1 suggest that, for the test problems considered, the truncated Lanczos projection method is able to produce solutions essentially equivalent to those obtained by truncated eigenvalue decomposition, with a cheaper algorithm, as the number of iterations required is sometimes far less than the number of eigenvalues required.

Table 2 reports results obtained for nonsymmetric linear discrete ill-posed problems (1.1.1). Here the LSQR method is compared to truncated singular value decomposition (TSVD). The table confirms the conclusions deduced from Table 1.

39 Table 2: Solution of nonsymmetric linear systems: the errors ELSQR and ETSVD are optimal for LSQR and TSVD. The corresponding truncation parameters are denoted by kLSQR and kTSVD. Three noise levels are considered; ` denotes the number of Golub–Kahan iterations performed.

noise matrix ` ELSQR kLSQR ETSVD kTSVD δ = 10−6 Baart 10 5.1×10−2 6 5.1×10−2 6 Heat 196 5.3×10−3 54 5.4×10−3 74 Lotkin 18 3.1×10−1 10 3.1×10−1 10 Tomo 195 7.6×10−3 195 7.6×10−3 195 Wing 7 3.3×10−1 5 3.3×10−1 5

δ = 10−4 Baart 10 7.7×10−2 5 7.7×10−2 5 Heat 196 1.5×10−2 26 1.5×10−2 37 Lotkin 18 4.3×10−1 7 4.3×10−1 7 Tomo 195 2.1×10−2 195 2.3×10−2 195 Wing 7 4.5×10−1 4 4.5×10−1 4

δ = 10−2 Baart 10 1.5×10−1 3 1.5×10−1 3 Heat 196 9.4×10−2 13 9.8×10−2 21 Lotkin 18 4.5×10−1 3 4.5×10−1 3 Tomo 195 1.9×10−1 48 2.0×10−1 180 Wing 7 6.0×10−1 2 6.0×10−1 2

40 Figure 7: The first four LSQR solutions to the Baart test problem (thin lines) are compared to the corre- sponding TSVD solutions (dashed lines) and to the exact solution (thick line). The size of the problem is n = 200, the noise level is δ = 10−4. The thin and dashed lines are very close.

Figure 7 displays the first four regularized solutions produced by the LSQR and TSVD methods when applied to solve the Baart test problem with a noise-contaminated vector b. The noise level is δ = 10−4.

The approximate solutions determined by the LSQR and TSVD methods can be seen to approach each other when the number of iterations k or the truncation parameter k is increased from one to four.

The Tomo test problems arises from the discretization of a 2D tomography problem. Its numerical solution displays some interesting features. It is clear from Table 2 that, when the noise level is large, LSQR produces an approximate solution after kLSQR steps that is of essentially the same quality as approximate solutions determined by TSVD with a truncation parameter kTSVD that is much larger than kLSQR. To better understand this behavior, we consider an image of size 15 × 15 pixels. This gives rise to a minimization problem

225×225 (1.1.1) with a matrix A ∈ R . Figure 8 shows the relative errors ELSQR and ETSVD as functions of the parameter k. LSQR is seen to give much faster convergence to xtrue. The best attainable approximate solutions by LSQR and TSVD are displayed in Figure 9. The LSQR method yields the best approximation of xtrue at step kLSQR = 66. The upper right plot displays this computed solution; the image xtrue is shown in

41 Figure 8: Convergence history for the LSQR and TSVD solutions to the Tomo example of size n = 225, −2 with noise level δ = 10 . The error ELSQR has a minimum at k = 66, while ETSVD is minimal for k = 215. the upper left plot. For comparison, the lower left plot of Figure 9 shows the TSVD solution for k = 66. This restoration is seen to be of poor quality. The best approximate solution determined by TSVD has truncation index kTSVD = 216; it can be seen to be of about the same quality as the best LSQR solution. The results for nonsymmetric problems agree with the ones presented by Hanke [38].

2.5 Conclusion

This chapter shows that the largest eigenvalues (in magnitude) of symmetric matrices and the largest singular values of nonsymmetric matrices that are defined by linear discrete ill-posed problems are well approximated by the corresponding eigenvalues and singular values of projected problems determined by a few steps of the

Lanczos or Golub–Kahan bidiagonalization methods, respectively. Similarly for the corresponding eigen- vectors and singular vectors. This suggests that it often suffices to use a partial Lanczos decomposition or a partial Golub–Kahan bidiagonalization, which are cheaper to compute than partial eigenvalue or singular value decompositions, to determine a solution. Computed examples provide illustrations.

42 Figure 9: Solution by LSQR and TSVD to the Tomo example of size n = 225, with noise level δ = 10−2: exact solution (top left), optimal LSQR solution (top right), TSVD solution corresponding to the same truncation parameter (bottom left), optimal TSVD solution (bottom right).

43 CHAPTER 3

Computation of a truncated SVD of a large linear discrete ill-posed problem

3.1 Introduction

The need to compute the largest or a few of the largest singular values and, generally, also the associated right and left singular vectors of a large matrix of a linear discrete ill-posed problems arises in a variety of applications including the approximate minimization of the generalized cross validation function for determining the amount of regularization [26], the solution of large-scale discrete ill-posed problems with two constraints on the computed solution [50], and the solution of large-scale discrete ill-posed problems with a nonnegativity constraint on the solution [10]. This chapter will focus on the solution of minimization problems (1.1.1) with the aid of the truncated SVD (TSVD). We will refer to the triplets made up of the largest singular values and associated right and left singular vectors of a matrix A as the largest singular triplets of A.

We will illustrate that the largest singular triplets of a large matrix A, that stems from the discretization of a linear ill-posed problem, typically, can be computed inexpensively by implicitly restarted Golub–Kahan bidiagonalization methods such as those described in [6–9, 44]. This is true, in particular, in the common situation when the largest singular values are fairly well separated. Computed examples show the number of matrix-vector product evaluations required with the matrices A and AT by these methods to be only a small multiple (larger than one) of the number needed to compute a partial Golub–Kahan bidiagonalization. This behavior is suggested by results shown in Chapter 2. Typically, only a few of the largest singular triplets of A are required to determine a useful approximation of xtrue. The computation of these triplets is much cheaper than the computation of the (full) SVD of the matrix. We remark that in the applications mentioned above it is convenient or necessary to use the largest singular triplets rather than a partial Golub–Kahan bidiagonalization of the matrix.

44 Many methods have been proposed for the solution of large-scale linear discrete ill-posed problems (1.1.1).

For instance, several iterative methods are available and they do not require the computation of the largest singular triplets of the matrix A; see, e.g., [16, 21, 23, 31, 40, 52, 53, 55] and references therein. However, knowledge of a few of the largest singular values and associated singular vectors often provides valuable insight into the properties of the problem being solved. We will show that the computation of a few of the largest singular triplets generally is quite inexpensive.

3.2 Symmetric linear discrete ill-posed problems

Let the matrix A ∈ Rn×n in (1.1.1) be symmetric. We are interested in smooth approximate solutions of

(1.1.1). These typically can be represented as a linear combination of some of the first eigenvectors w1,w2, w3,... of A. The last eigenvectors generally represent discretizations of highly oscillatory functions. They model “noise” and should not be part of the computed approximate solution. A few of the eigenpairs associ- ated with eigenvalues of largest magnitude of many symmetric matrices that stem from the discretization of linear discrete ill-posed problems can be computed efficiently by implicitly restarted Lanczos methods such as those described in [4, 5, 13].

Recall the truncated eigenvalue decomposition (TEVD) from (1.2.7), with k replaced by s,

T As = WsΛsWs ,

n×s where Ws = [w1,w2,...,ws] ∈ R and

s×s Λs = diag[λ1,λ2,...,λs] ∈ R

for some 1 ≤ s  n. We now turn to the computation of the matrices Ws and Λs. The most popular approaches to compute a few extreme eigenvalues and associated eigenvectors of a large symmetric matrix are based on the symmetric Lanczos process, which is displayed by Algorithm 2. Assume for the moment

45 that the input parameter ` in Algorithm 2 is small enough so that the algorithm does not break down, i.e.,

β j+1 > 0 for 1 ≤ j ≤ `. The scalars α j and β j determined by Algorithm 2 then define the symmetric tridiagonal matrix (1.3.5). The vectors v j generated by the algorithm are orthonormal and define the matrix

n×` V` = [v1,v2,...,v`] ∈ R . A matrix interpretation of the recursion relations of Algorithm 2 gives the

(partial) Lanczos decomposition (1.3.6). We will use Theorem 2.2.1, but note that the choice of initial unit vector v1 does not have to be the same as in Algorithm 2.

Since the matrix A defines a linear discrete ill-posed problem, its eigenvalues λ j “cluster” at the origin for large j. Therefore, by (2.2.1) or (2.2.8), the off-diagonal entries β j of the matrix (1.3.5) also “cluster” at zero for j large. We used this property in Chapter 2 to show that, for sufficiently large j, the vectors v j generated by Algorithm 2 are accurate approximations of eigenvectors associated with eigenvalues close to the origin.

Computed examples in Chapter 2 illustrate that for several common linear discrete ill-posed test problems, the space span{v1,v2,...,v`} essentially contains the space span{w1,w2,...,wd`/3e} already for quite small values of `. As before, dηe denotes the smallest integer bounded below by η ≥ 0.

We would like to determine the first few eigenvalues, ordered according to (1.2.6), and associated eigen- vectors of a large symmetric matrix of a linear discrete ill-posed problem. The fact that the vectors v j determined by Algorithm 2 are accurate approximations of eigenvectors for j large enough suggests that only a few iterations with an implicitly restarted symmetric Lanczos method, such as the methods described in [4, 5, 17, 68], are required. These methods compute a sequence of Lanczos decompositions of the form

(1.3.6) with different initial vectors. Let v1 be the initial vector in the presently available Lanczos decompo- sition (1.3.6). An initial vector for the next Lanczos decomposition is determined by applying a polynomial

filter q(A) to v1 to obtain the initial vector q(A)v1/kq(A)v1k of the next Lanczos decomposition. The computation of q(A)v1 is carried out without evaluating additional matrix-vector products with A. The im- plementations [4, 5, 17, 68] use different polynomials q. It is the purpose of the polynomial filter q(A) to damp components of unwanted eigenvectors in the vector v1. We are interested in damping components of eigenvectors associated with eigenvalues of small magnitude. The implicitly restarted symmetric Lanczos

46 method was first described in [17, 68]. We will use the implementation [4, 5] of the implicitly restarted symmetric block Lanczos method with block size one in computations reported in Section 3.4.

We remark that there are several reasons for solving linear discrete ill-posed problems (1.1.1) with a sym- metric matrix A by the TEVD method. One of them is that the singular values of A, i.e., the magnitude of the eigenvalues of A, provide important information about the matrix A and thereby about properties of the linear discrete ill-posed problem (1.1.1). For instance, the decay rate of the singular values with increasing index is an important property of a linear discrete ill-posed problem. Moreover, the truncated eigendecom- position (1.2.7) may furnish an economical storage format for the important part of the matrix A. Large matrices A that stem from the discretization of a Fredholm integral equation of the first kind are generally dense; however, it often suffices to store only a few of its largest eigenpairs. Storage of these eigenpairs typically requires much less computer memory than storage of the matrix A.

3.3 Nonsymmetric linear discrete ill-posed problems

This section is concerned with the solution of linear discrete ill-posed problems (1.1.1) with a large non- symmetric matrix A ∈ Rm×n. Such a matrix can be reduced to a small matrix by application of a few steps of Golub–Kahan bidiagonalization. This is described by Algorithm 3 in Section 1.3.3.

The connection between the Golub–Kahan bidiagonalization and the Lanczos decomposition (1.3.6) is ap- plied in Section 2.3 to show Corollary 2.3.1. The initial unit vector p1 in the Golub–Kahan bidiagonalization

(1.3.8) is not required to be b/kbk. It follows from Corollary 2.3.1 and the fact that the singular values of A cluster at the origin, that the columns q j of the matrix Q` in (1.3.8) for large j are accurate approximations of eigenvectors of AT A, as discussed in Section 2.3. This suggests that the implicitly restarted Golub–Kahan bidiagonalization methods described in [6–9], for many matrices A that stem from the discretization of a linear ill-posed problem, only require fairly few matrix-vector product evaluations to determine a truncated singular value decomposition (1.2.4) with s fairly small. Illustrations are presented in the following section.

We remark that the solution of linear discrete ill-posed problems (1.1.1) with a nonsymmetrix matrix by the

47 TSVD method is of interest for the same reasons as the TEVD is attractive to use for the solution of linear discrete ill-posed problems with a symmetric matrix A; see the end of Section 3.2 for a discussion.

3.4 Computed examples

The main purpose of the computed examples is to illustrate that a few of the largest singular triplets of a large matrix A (or a few of the largest eigenpairs when A is symmetric) can be computed quite inexpensively when A defines a linear discrete ill-posed problem (1.1.1). All computations are carried out using MATLAB

R2012a with about 15 significant decimal digits. A Sony computer running Windows 10 with 4 GB of

RAM was used. MATLAB codes for determining the discrete ill-posed problems in the computed examples stem from Regularization Tools by Hansen [41]. When not explicitly stated otherwise, the matrices A are obtained by discretizing a Fredholm integral equation of the first kind, and are square and of order n = 500.

For some examples finer discretizations, resulting in larger matrices, are used.

The first few examples illustrate that the number of matrix-vector product evaluations required to compute the k eigenpairs of largest magnitude or the k largest singular triplets of a large matrix obtained by the discretization of an ill-posed problem is a fairly small multiple of k. The computations for a symmetric matrix A can be organized in two ways: We use the code irbleigs described in [4, 5] or the code irbla presented in [7]. The former code has an input parameter that specifies whether the k largest or the k smallest eigenvalues are to be computed. Symmetric semidefinite matrices A may only require one call of irbleigs to determine the desired eigenpairs, while symmetric indefinite matrices require at least one call for computing a few of the largest eigenvalues and associated eigenvectors and at least one call for computing a few of the smallest eigenvalues and associated eigenvectors. We have found that the irbla code, which determines a few of the largest singular values and associated singular vectors, can be competitive with irbleigs for symmetric indefinite matrices because it is possible to compute all the required eigenpairs (i.e., singular triplets) with only one call of irbla.

We use the MATLAB code irbleigs [5] with block size one to compute the k eigenvalues of largest magnitude

48 of a large symmetric matrix A. The code carries out d2.5ke Lanczos steps between restarts; i.e., a sequence of Lanczos decompositions (1.3.6) with ` = d2.5ke are computed with different initial vectors v1 until the k desired eigenvalues and associated eigenvectors have been determined with specified accuracy. The default criterion for accepting computed approximate eigenpairs is used, i.e., a computed approximate eigenpair

{eλ j,wej}, with kwejk = 1, is accepted as an eigenpair of A if

(3.4.1) kAwej − wejeλ jk ≤ εη(A), j = 1,2,...,k, where η(A) is an easily computable approximation of kAk and ε = 10−6. The irbleigs code uses a computed approximation of the largest singular value of A as η(A).

Table 1: foxgood test problem.

Number of desired Size of the largest Number of eigenpairs k tridiagonal matrix matrix-vector products 5 d2.5ke 24 10 d2.5ke 32 15 d2.5ke 32 20 d2.5ke 40 25 d2.5ke 50

Example 3.4.1. We illustrate the performance of the irbleigs method [5] when applied to the foxgood test problem (1.4.3), a discretization of the Fredholm integral equation of the first kind. Table 1 displays the average number of matrix-vector product evaluations required by irbleigs over 1000 runs rounded to the closest integer when applied as described above to compute the k eigenpairs of largest magnitude for k = 5,

10,...,25. The number of matrix-vector product evaluations is seen to grow about linearly with k for the larger k-values. Since irbleigs chooses the initial vector in the first Lanczos decomposition computed to be a unit random vector, the number of matrix-vector product evaluations may vary somewhat between different calls of irbleigs.

We remark that the choice of ` = d2.5ke steps with the Lanczos method between restarts is somewhat

49 arbitrary and so is the choice of block size one. While the exact number of matrix-vector product evaluations depends on these choices, the linear growth of the number of matrix-vector products computed with the number of desired eigenpairs can be observed for many choices of ` and block sizes. 2

In the following examples we use the MATLAB code irbla [7], which implements a restarted Golub–Kahan block bidiagonalization method. We set the block size to one. In order to determine the k largest singular triplets, irbla determines a sequence of Golub–Kahan decompositions (1.3.8) with ` chosen to be d1.5ke or smaller for different initial vectors p1 until the k largest singular triplets have been computed with desired accuracy. The default stopping criterion is used, which is analogous to (3.4.1). The initial vector p1 in the first Golub–Kahan bidiagonalization (1.3.8) determined by irbla is a unit random vector. The number of matrix-vector product evaluations therefore may vary somewhat between different calls of irbla. The number of matrix-vector product evaluations with A (and with AT when A is nonsymmetric) reported in the tables are averages over 1000 runs rounded to the closest integer.

When solving linear discrete ill-posed problems (1.1.1) by truncated iteration using Golub–Kahan bidiago- nalization, one generally uses the initial vector p1 = b/kbk; see, e.g., [31, 40]. This suggests that it could be beneficial to use this vector as initial vector in the first Golub–Kahan bidiagonalization computed by irbla. It turns out that choosing p1 in this manner, instead of as a random unit vector, changes the number of matrix-vector product evaluations required only very little. We will illustrate this below.

Table 2: shaw test problem.

Number of desired Size of the largest Number of eigenpairs k bidiagonal matrix matrix-vector products 5 d1.5ke 19 10 d1.5ke 30 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76

Example 3.4.2. We now consider the shaw test problem (1.4.7). Table 2 shows the number of matrix-vector product evaluations required for computing the k eigenpairs of largest magnitude using the code irbla for

50 k = 5,10,...,25 when the number of steps between restarts is d1.5ke. The number of matrix-vector products is seen to grow about linearly with k for k ≥ 10. Thus, the computational effort required is quite small.

Table 2 displays averages over 1000 runs with random initial unit vectors p1 for the first Golub–Kahan bidiagonalization computed. When instead using p1 = b/kbk, the number of matrix-vector products is unchanged for k = 10,15,20,25, and is reduced to 16 for k = 5. Thus, the effect of changing the initial vector in irbla is small. Table 3: shaw test problem.

Number of desired Size of the largest Number of eigenpairs k bidiagonal matrix matrix-vector products 5 k + 3 19 10 k + 2 24 15 k + 2 34 20 k + 2 44 25 k + 1 52

The number of required matrix-vector product evaluations depends on the number of steps, `, with the

Golub–Kahan bidiagonalization method between restarts. We found that choosing ` very small may increase the required number of matrix-vector product evaluations with A and AT . Moreover, choosing a large `-value does not always result in a reduced number of matrix-vector product evaluations. This is illustrated by Table

3, in which the number of bidiagonalization steps between restarts is smaller than for Table 2, and so are the number of matrix-vector product evaluations required to determine the desired singular triplets.

The number ` of bidiagonalization steps between restarts that requires the smallest number of matrix-vector product evaluations is difficult to determine a priori. The important observation is that for many choices of

` the number of matrix-vector product evaluations with A and AT is quite small. This makes it possible to compute a few of the largest singular triplets of a large matrix fairly inexpensively.

Table 3 displays averages over 1000 runs with random initial unit vectors p1 for the first Golub–Kahan bidi- agonalization computed. When instead using p1 = b/kbk, the number of matrix-vector product evaluations

51 is unchanged for k = 10,15,20,25, and is increased to 22 for k = 5. 2

Table 4: phillips test problem.

Number of desired Size of the largest Number of eigenpairs k bidiagonal matrix matrix-vector products 5 d1.5ke 22 10 d1.5ke 36 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76

Example 3.4.3. We now consider the phillips test problem (1.4.6). Table 4 displays the number of matrix- vector product evaluations required to compute the k eigenpairs of largest magnitude (i.e., the k largest singular triplets) for k = 5,10,...,25 by irbla. A random unit vector p1 is used as initial vector for the first

Golub–Kahan bidiagonalization computed by irbla and the number of matrix-vector product evaluations are averages over 1000 runs. The number of matrix-vector product evaluations is seen to grow about linearly with k for k ≥ 10. If instead p1 = b/kbk is used as initial vector for the first Golub–Kahan bidiagonalization computed by irbla, the number of matrix-vector product evaluations is the same for k = 5,10,...,25. Thus, the choice of initial vector is not important.

The number of required matrix-vector product evaluations is quite insensitive to how finely the integral equations (1.4.6) is discretized for all fine enough discretizations. For instance, when the integral equation is discretized by a Galerkin method using the MATLAB function phillips from [41] to obtain a matrix

A ∈ R5000×5000 and the initial vector for the first Golub–Kahan bidiagonalization computed by irbla is chosen

500×500 to be p1 = b/kbk, the number of required matrix-vector product evaluations is the same as for A ∈ R .

We conclude that the computational expense to compute a truncated singular value decomposition of A is modest also for large matrices. 2

The following examples are concerned with nonsymmetric matrices. The k largest singular triplets are computed with the irbla code [7] using block size one.

Example 3.4.4. This example uses the baart test problem (1.4.1). Table 5 shows the number of required

52 Table 5: baart test problem.

Number of desired Size of the largest Number of singular triplets k bidiagonal matrix matrix-vector products 5 d1.5ke 16 10 d1.5ke 30 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76 matrix-vector product evaluations to grow roughly linearly with the number of desired largest singular triplets. Both the initial vector p1 = b/kbk and the average over 1000 runs with a random unit initial vector p1 for the initial Golub–Kahan bidiagonalization computed by irbla yield the same entries of the last column. Table 6: baart test problem.

Number of desired Size of the largest Number of singular triples k bidiagonal matrix matrix-vector products 5 k + 2 14 10 k + 1 22 15 k + 1 32 20 k + 1 42 25 k + 1 52

Table 6 is analogous to Table 5. Only the number of steps ` between restarts of Golub–Kahan bidiagonal- ization differs. They are smaller in Table 6 than in Table 5 and so are the required number of matrix-vector product evaluations with A and AT . Also in Table 6 the number of matrix-vector products needed is seen to grow about linearly with k. The initial vector p1 = b/kbk and the average over 1000 runs with a random unit initial vector p1 for the first Golub–Kahan bidiagonalization computed by irbla yield the same entries in the last column of Table 6. 2

Example 3.4.5. This example uses the i_laplace test problem (1.4.5). Table 7 displays the number of matrix- vector product evaluations required to compute the k largest singular triples. This number is seen to grow

53 Table 7: Inverse Laplace transform test problem.

Number of desired Size of the largest Number of singular triplets k bidiagonal matrix matrix-vector products 5 d1.5ke 22 10 d1.5ke 30 15 d1.5ke 46 20 d1.5ke 60 25 d1.5ke 76

about linearly with k. Therefore, the computational effort is quite small. The initial vector p1 = b/kbk and the average over 1000 runs with a random unit initial vector p1 for the initial Golub–Kahan bidiagonalization computed by irbla yield the same entries in the last column of Table 7. 2

Table 8: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is a unit random vector.

Problem s MVP Epsvd Etsvd Es shaw 7 22 8.23×10−15 5.06×10−2 5.06×10−2 phillips 7 22 7.77×10−9 2.53×10−2 2.53×10−2 baart 3 10 7.94×10−8 1.67×10−1 1.67×10−1 i_laplace 7 22 3.67×10−7 2.24×10−1 2.24×10−1

Table 9: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−2. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk.

Problem s MVP Epsvd Etsvd Es shaw 7 22 8.29×10−15 4.97×10−2 4.97×10−2 phillips 7 22 9.25×10−10 2.50×10−2 2.50×10−2 baart 3 10 1.15×10−9 1.68×10−1 1.68×10−1 i_laplace 7 22 1.00×10−8 2.24×10−1 2.24×10−1

Example 3.4.6. This example compares the quality of approximate solutions of linear discrete ill-posed problems computed by truncated singular value decomposition. The decompositions are determined as described in this chapter as well as by computing the full SVD (1.2.1) with the MATLAB function svd. It is the aim of this example to show that the decompositions computed in these ways yield approximations of xtrue of the same quality. We use the discrepancy principle (1.2.15) to determine the truncation index s.

54 Table 10: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−4. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk.

Problem s MVP Epsvd Etsvd Es shaw 9 28 5.01×10−14 3.21×10−2 3.21×10−2 phillips 12 42 3.33×10−8 4.23×10−3 4.23×10−3 baart 4 12 1.71×10−10 1.15×10−1 1.15×10−1 i_laplace 12 40 5.46×10−13 1.71×10−1 1.71×10−1

Table 11: Example 3.6: Relative errors and number of matrix-vector products, δ˜ = 10−6. The initial vector for the first Golub–Kahan bidiagonalization computed by irbla is b/kbk.

Problem s MVP Epsvd Etsvd Es shaw 10 30 1.22×10−12 1.94×10−2 1.94×10−2 phillips 27 82 2.47×10−13 7.99×10−4 7.99×10−4 baart 5 16 1.67×10−12 5.26×10−2 5.26×10−2 i_laplace 18 54 4.32×10−12 1.43×10−1 1.43×10−1

The number of matrix-vector product evaluations used to compute approximations of xtrue depends on the number of singular triplets required to satisfy the discrepancy principle. The latter number typically in- creases as the error in the vector b in (1.1.1) decreases because then the least-squares problem has to be solved more accurately. In realistic applications, one would first compute the ` largest singular triplets, for some user-chosen value of `, and if it turns out that additional singular triplets are required to satisfy the discrepancy principle, then one would determine the next, say, ` largest singular triplets, etc. This approach may require the evaluation of somewhat more matrix-vector products than if all required largest singular triplets were computed together. Conversely, if only the `/2 largest singular triplets turn out to be needed to satisfy the discrepancy principle, then the number of matrix-vector products can be reduced by initially computing only these triplets instead of the ` largest singular triplets.

To reduce the influence on the number of matrix-vector products required by the somewhat arbitrary choice of the initial number of largest singular triplets to be computed, we proceed as follows. We first compute the SVD (1.2.1) using the MATLAB function svd and then determine the smallest truncation index s so that the discrepancy principle (1.2.15) holds. We denote the approximation of xtrue so determined by xtsvd. Then

55 we use irbla to compute the s largest singular triplets of A. Several tables compare the quality of the so computed approximations of xtrue in different ways. The quantity

kxs − xtsvdk Epsvd = kxtsvdk

shows the relative difference between the approximate solution xtsvd and the approximate solution xs deter- mined by computing a truncated singular value decomposition (1.2.4) of the matrix A using the irbla method with the number of Golub–Kahan bidiagonalization steps between restarts set to d1.5se. We also display the relative difference

kxtsvd − xtruek Etsvd = , kxtruek which shows how well the approximate solution determined by the full singular value decomposition ap- proximates the desired solution xtrue. The analogous relative difference

kxs − xtruek Es = kxtruek shows how well the approximate solution determined by the truncated singular value decomposition approx- imates xtrue.

−t Tables 8-11 report results for three noise levels δ˜ = kek/kbtruek (δ˜ = 10 for t = 2,4,6) and τ = 1 in

(1.2.15). The error vector e models white Gaussian noise. Thus, given the vector btrue, which is generated by the MATLAB function that determines the matrix A, a vector e that models white Gaussian noise is added to btrue to obtain the error-contaminated vector b; cf. (1.1.2). The vector e is scaled to correspond to a prescribed noise level δ˜. We generate this additive noise vector, e, with

δ˜ e := ekb k√ , b true n

500 where eb∈ R is a random vector whose entries are from a with mean zero and variance

56 one.

For Tables 9-11, the initial vector for the first Golub–Kahan bidiagonalization computed by irbla is chosen to be p1 = b/kbk. This choice is quite natural, because we would like to solve least-squares problems (1.1.1) with data vector b. The table columns with heading “MVP” display the number of matrix-vector product evaluations required by irbla to compute the truncated singular value decomposition required to satisfy the discrepancy principle. Table 8 shows averages over 1000 runs with random unit initial vectors p1 for the

first Golub–Kahan bidiagonalization computed by irbla. The entries of this table and Table 9 are quite close.

The closeness of corresponding entries also can be observed for smaller noise levels. We therefore omit to display tables analogous to Table 8 for smaller noise levels.

Tables 8-11 show the computed approximate solutions determined by using irbla to give as good approxi- mations of xtrue as the approximate solutions xtsvd computed with the aid of the full SVD (1.2.1), while being much cheaper to evaluate. 2

Example 3.4.7. The LSQR for the solution of the minimization problem (1.1.1) deter- mines partial bidiagonalizations of A with initial vector b/kbk. These bidiagonalizations are closely related to the bidiagonalizations (1.3.8); see, e.g., [12, 35] for details. In the ith step, LSQR computes an approx-

T T T T T T i−1 T imate solution xi in the Krylov subspace Ki(A A,A b) = span{A b,(A A)A b,...,(A A) A b}. This approximate solution satisfies

kAxi − bk = min kAx − bk. T T x∈Ki(A A,A b)

The number of steps i is determined by the discrepancy principle, i.e., i is chosen to be the smallest integer such that the computed approximate solution xi satisfies

kAxi − bk ≤ τδ;

c.f. (1.2.15). Details on the application of LSQR with the discrepancy principle are discussed in, e.g.,

[23, 31, 40].

57 This example considers the situation when there are several data vectors, i.e., we would like to solve

(3.4.2) min kAx( j) − b( j)k, j = 1,2,...,`. x( j)∈Rn

We let the matrix A ∈ R500×500 be determined by one of the functions in Regularization Tools [41] already

(1) 500 used above. This function also determines the error-free data vector btrue ∈ R . The remaining error-free

( j) 500 ( j) data vectors btrue ∈ R are obtained by choosing discretizations xtrue, j = 2,3,...,`, of functions of the form α sin(βt) + γ, α cos(βt) + γ, α arctan(βt) + γ, and αt2 + βt + γ, where α, β, and γ are randomly

( j) ( j) ( j) 500 generated scalars, and letting btrue = Axtrue. A “noise” vector e ∈ R with normally distributed random

( j) ( j) entries with zero mean is added to each data vector btrue to determine an error-contaminated data vector b ; see (1.1.2). The noise-vectors e( j) are scaled to correspond to a specified noise level. This is simulated with

( j) δ˜ e( j) := e( j) kb k√ , b true n

˜ ( j) n where δ is the noise level, and eb ∈ R is a vector, whose elements are normally distributed random numbers with mean zero and variance one.

Assume that the data vectors are available sequentially. Then the linear discrete ill-posed problems (3.4.2) can be solved one by one by

(A) applying the LSQR iterative method to each one of the ` linear discrete ill-posed problems (3.4.2).

The iterations for each system are terminated by the discrepancy principle. Since the data vectors b( j)

are distinct, each one of these vectors requires that a new partial Golub–Kahan bidiagonalization be

computed, and by

(B) computing one TSVD of the matrix A by the irbla method, and then using this decomposition to

determine an approximate solution of each one of the problems (3.4.2). The discrepancy principle is

applied to compute the approximate solution of each least-squares problem. Thus, we determine the

parameter s in (1.2.4) as large as possible so that the discrepancy principle (1.2.15) holds with τ = 1.

58 Seed methods furnish another solution approach that is commonly applied when seeking to solve several lin- ear systems of equations with the same matrix and different right-hand sides that stem from the discretization of well-posed problems, such as Dirichlet boundary value problems for an elliptic partial differential equa- tions; see, e.g., [1, 63, 67] and references therein. A Golub–Kahan bidiagonalization-based seed method applied to the solution of the ` > 1 least-squares problems (3.4.2) could proceed as follows. First one com- putes a partial Golub–Kahan bidiagonalization for the least-squares problem with one of the data vectors, say b(1), and then uses this bidiagonalization to solve all the remaining least-squares problems (3.4.2). The

(1) (1) Golub–Kahan bidiagonalization is determined by the matrix A and the initial vector p1 = b /kb k. We will not apply this technique in this example but will explore a variation of it in Section 4.3.

Tables 12-14 compare the number of matrix-vector products required by the approaches (A) and (B) for

−2 −4 ` = 10 and noise-contaminated data vectors b j corresponding to the noise levels δ˜ = 10 , δ˜ = 10 , and

δ˜ = 10−6. For each noise level, 100 noise-contaminated data vectors b( j) are generated for each 1 ≤ j ≤ `.

(1) (1) The initial Golub–Kahan bidiagonalization determined by irbla uses the initial vector p1 = b /kb k.

The tables display averages over the realizations of the noise-contaminated data vectors b( j). Similarly as in Example 3.4.5, the number of matrix-vector product evaluations required by the method of this chapter depends on the initial choice of the number of singular triplets to be computed. To simplify the comparison, we assume this number to be known. The competitiveness of the method of this chapter is not significantly affected by this assumption; it is straightforward to compute more singular triplets if the initial number of computed triplets is found to be too small to satisfy the discrepancy principle. To avoid that round-off errors introduced during the computations with LSQR delay convergence, Golub–Kahan bidiagonalization is carried out with reorthogonalization; see [40] for a discussion of the role of round-off errors on the convergence.

The columns of Tables 12-14 are labeled similarly as in the previous examples. The error in the computed solutions is the maximum for the errors for each one of the least-squares problems (3.4.2). The columns labeled MVPlsqr show the average numbers of matrix-vector product evaluations required by LSQR, and

59 the columns denoted by Elsqr shows the relative error in the approximate solution determined by the LSQR algorithm, xlsqr, i.e.,

kxlsqr − xtruek Elsqr = . kxtruek

Tables 12-14 show the method of the present chapter to require significantly fewer matrix-vector product evaluations than repeated application of LSQR. For large-scale problems, the dominating computational effort of these methods is the evaluation of matrix-vector products with A and AT . Similarly as in Example

3.4.2, the number of matrix-vector product evaluations does not change significantly with the fineness of the discretization, i.e., with the size of the matrix A determined by the functions in [41].

Finally, we remark that, while the present example uses the discrepancy principle to determine the parameter s in the TSVD and the number of LSQR iterations, other techniques also can be used for these purposes, such as methods discussed in [25, 26, 40, 45, 61]. The results would be fairly similar. Thus, the relative performance of the methods displayed in Tables 12-14 is not very sensitive to how the parameter s and the number of iterations are determined nor to their exact values. 2

Table 12: Relative errors and number of matrix-vector product evaluations, δ˜ = 10−2.

Problem s MVP MVPlsqr Epsvd Etsvd Es Elsqr shaw 6 20 133 3×10−6 2×10−1 2×10−1 1×10−1 phillips 7 22 152 1×10−6 7×10−2 7×10−2 6×10−2 baart 3 10 71 2×10−8 1×10−1 1×10−1 1×10−1 i_laplace 8 25 175 1×10−5 7×10−1 7×10−1 7×10−1

Table 13: Relative errors and number of matrix-vector product evaluations, δ˜ = 10−4.

Problem s MVP MVPlsqr Epsvd Etsvd Es Elsqr shaw 9 27 178 2×10−13 3×10−2 3×10−2 3×10−2 phillips 14 41 279 3×10−4 1×10−2 1×10−2 8×10−3 baart 5 14 99 3×10−10 7×10−2 7×10−2 7×10−2 i_laplace 14 42 285 3×10−13 7×10−1 7×10−1 7×10−1

60 Table 14: Relative errors and number of matrix-vector product evaluations, δ˜ = 10−6.

Problem s MVP MVPlsqr Epsvd Etsvd Es Elsqr shaw 10 30 203 1×10−12 7×10−3 7×10−3 7×10−3 phillips 30 89 601 2×10−11 3×10−3 3×10−3 1×10−3 baart 5 17 117 6×10−10 4×10−2 4×10−2 4×10−2 i_laplace 19 58 393 2×10−11 7×10−1 7×10−1 7×10−1

3.5 Conclusion

Knowledge of the largest singular values and associated singular vectors of a matrix may provide important information about the linear discrete ill-posed problem at hand. However, the computation of the singular value decomposition of a general large matrix can be prohibitively expensive. This chapter illustrates that the computation of a few of the largest singular triplets of a matrix that stems from the discretization of an ill-posed problem may be quite inexpensive. The largest singular triplets generally are the only singular triplets of interest. Similarly, the computed examples show that it can be quite inexpensive to compute a few eigenpairs of largest magnitude of a symmetric matrix of a linear discrete ill-posed problem. Applications to the solution of several linear discrete ill-posed problems with the same matrix and different data vectors show the computation of the TSVD of the matrix to be competitive with sequential solution of the linear discrete ill-posed problems by the LSQR iterative method.

61 CHAPTER 4

Solution methods for linear discrete ill-posed problems for color image restoration

We will discuss the use of iterative methods based on standard or block Golub–Kahan-type bidiagonlization, combined with Tikhonov regularization, to the restoration of a multi-channel image from an available blur- and noise-contaminated version. Applications include the restoration of color images whose RGB (red, green, and blue) representation uses three channels; see [28, 42]. The methods described can also be applied to the solution of Fredholm integral equations of the first kind in two or more space dimensions and to the restoration of hyper-spectral images. The latter kind of images generalize color images in that they allow more than three “colors”; see, e.g. [48]. For definiteness, we will focus on the restoration of k-channel images that have been contaminated by blur and noise, and we formulate this restoration task as a linear system of equations with k right-hand side vectors, where each spectral band corresponds to one channel.

To simplify, our notation we assume the image to be represented by an array of n × n pixels in each one of the k channels, where 1 ≤ k  n2. Let b(i) ∈ Rn2 represent the available blur- and noise-contaminated image

(i) n2 (i) n2 in channel i, let e ∈ R describe the noise in this channel, and let xtrue ∈ R denote the desired unknown

n2k blur- and noise-free image in channel i. The corresponding quantities for all k channels b,xtrue,e ∈ R are

(i) (i) (i) (1) T (k) T T obtained by stacking the vectors b ,xtrue,e of each channel. For instance, b = [(b ) ,...,(b ) ] .

The degradation model is of the form

(4.0.1) b = Hxtrue + e

62 with blurring matrix

   a1,1A a1,2A ··· a1,kA         a2,1A a2,2A ··· a2,kA    n2k×n2k H = Ak ⊗ A =   ∈ R .    . . .   . . .        ak,1A ak,2A ··· ak,kA

Here ⊗ denotes the Kronecker product, the matrix A ∈ Rn2×n2 represents within-channel blurring, which

k×k is assumed to be the same in all channels, and the small matrix Ak ∈ R models cross-channel blurring.

Sometimes it is convenient to gather images for the different channels in “block vectors.” Introduce the

(1) (k) n2×k (1) (k) n2×k (1) (k) n2×k block vectors B = [b ,...,b ] ∈ R , Xtrue = [xtrue,...,xtrue] ∈ R , and E = [e ,...,e ] ∈ R .

Using properties of the Kronecker product, the model (4.0.1) can be expressed as

(4.0.2) B = A (Xtrue) + E,

where the linear operator A is defined by

2 2 A : Rn ×k → Rn ×k

T A (X) := AXAk .(4.0.3)

T T Its transpose is given by A (X) := A XAk. The model (4.0.2) is said to have cross-channel blurring when

Ak 6= Ik; when Ak = Ik, there is no cross-channel blurring. In the latter situation, the blurring is said to be within-channel only, and the deblurring problem decouples into k independent deblurring problems. The degradation model (4.0.1) then can be expressed in the form

(4.0.4) B = AXtrue + E,

63 For notational simplicity, we denote in the following both the matrix A in (4.0.4) and the linear operator A in (4.0.2) by A, and we write A (X) as AX. The singular values of a blurring matrix or operator A typically

“cluster” at the origin. It follows that the solution (if it exists) of the linear system of equations

(4.0.5) AX = B

is very sensitive to the error E in B. Let Btrue denote the (unknown) noise-free block vector associated with

B. The system of equations AX = Btrue is assumed to be consistent. We would like to determine an accurate approximation of Xtrue given B and A. This generally is a difficult computational task due to the error E in

B and the presence of tiny singular values of A. We remedy this by using the standard Tikhonov solution discussed in Section 1.2.3. It reduces the sensitivity of the solution of (4.0.5) to the error E in B by replacing

(4.0.5) by a penalized least-squares problem analogous to (1.2.12),

2 −1 2 (4.0.6) min {kAX − BkF + µ kXkF }, X∈Rn2×k

where µ > 0 is the regularization parameter and k · kF denotes the Frobenius norm. The normal equations associated with the minimization problem (4.0.6) are given by

(4.0.7) (AT A + µ−1I)X = AT B.

They have the unique solution

T −1 −1 T (4.0.8) Xµ = A A + µ I A B

for any µ > 0. The size of µ determines how sensitive Xµ is to the error in B and how close Xµ is to the

−1 desired solution Xtrue. Later in Section 4.1, we will comment on the use of the regularization parameter µ in (4.0.6) instead of λ as done in Sections 1.2.3 and 1.2.4. The computation of an accurate approximation

64 Xµ of Xtrue requires that a suitable value of µ be used. We will use the discrepancy principle, which was discussed in Section 1.2.4, to determine µ in the computed examples reported in Section 4.5. We assume that a bound ε > 0 of kEkF be available and prescribes that µ > 0 be chosen so that the solution (4.0.8) of

(4.0.6) satisfies

(4.0.9) kB − AXµ kF = ηε,

where η > 1 is a user-specified constant independent of ε. A zero-finder can be applied to determine a

µ-value such that the associated Tikhonov solution (4.0.8) satisfies (4.0.9). We will discuss how an approx- imate solution of (4.0.6) can be computed by first evaluating a partial block Golub–Kahan bidiagonalization

(BGKB) of A and then solving (4.0.6) in a subspace so defined. Alternatively, we may reduce A to a small bidiagonal matrix with the aid of global Golub–Kahan bidiagonalization (GGKB), which also is a block method, and then apply the connection between GGKB and Gauss-type quadrature rules to determine upper and lower bounds for the left-hand side of (4.0.9). This allows the computation of a suitable value of µ in a simple manner. This approach has previously been applied in [11]; the GGKB method was first described in [70]. The BGKB and GGKB block methods are compared to the application of Golub–Kahan bidiagonal- ization (with block size one) in two ways. One approach applies Golub–Kahan bidiagonalization with initial vector b(1) and generates a solution subspace that is large enough to solve all systems of equations

(4.0.10) Ax(i) = b(i), i = 1,...,k,

with Tikhonov regularization. The other approach is to simply solve each one of the k systems of equations

(4.0.10) independently with Golub–Kahan bidiagonalization and Tikhonov regularization, i.e., by using the algorithm described in [16] k times.

65 4.1 Solution by partial block Golub–Kahan bidiagonalization

Introduce for µ > 0 the function

2 (4.1.1) φ(µ) = kB − AXµ kF .

Substituting (4.0.8) into (4.1.1) and using the identity

(4.1.2) I − A(AT A + µ−1I)−1AT = (µAAT + I)−1

shows that (4.1.1) can be written as

T T  (4.1.3) φ(µ) = tr B fµ (AA )B

with

−2 (4.1.4) fµ (t) = (µt + 1) .

The determination of a value of the regular parameter µ > 0 such that (4.0.9) holds generally requires the function φ to be evaluated for several µ-values. Each evaluation of φ is very expensive for large-scale

T T problems. We therefore approximate the expression B fµ (AA )B by a simpler one, which we determine with a few steps of block Golub–Kahan bidiagonalization as follows. Introduce the QR factorization B =

n2×k k×k P1R1, where P1 ∈ R has orthonormal columns and R1 ∈ R is upper triangular. Then ` steps of the

BGKB method applied to A with initial block vector P1 gives the decompositions

T ˜ (k) ˜(k) ¯(k) T ˜(k) ˜ (k) (k) (4.1.5) AQ` = P`+1C` , A P` = Q` C` ,

66 ˜(k) n2×`k ˜(k) n2×(`+1)k ˜ (k) where the matrices P` = [P1,...,P`] ∈ R , P`+1 = [P1,...,P`+1] ∈ R , and Q` = [Q1,...,

n2×`k Q`] ∈ R have orthonormal columns, and

  L1        R L   2 2      ¯(k)  .. ..  k(`+1)×k` C` :=  . .  ∈ R          R` L`        R`+1

k×k is lower block bidiagonal with lower triangular diagonal blocks L j ∈ R and upper triangular blocks

k×k (k) ¯(k) (k) 2 R j ∈ R . Moreover, C` is the leading k`×k` submatrix of C` and P˜` is the leading n ×`k submatrix ˜(k) ˜ (k) T ˜(k) of P`+1. In case A denotes the operator A defined by (4.0.3), the expressions AQ` and A P` in the left-

T T hand sides of (4.1.5) should be replaced by [A (Q1),...,A (Q`)] and [A (P1),...,A (P`)], respectively.

When the block size is k = 1, the decompositions (4.1.5) simplify to the decompositions (1.3.8). The decompositions (4.1.5) differ from the ones described by Golub et al. [33], who compute an upper block bidiagonal matrix. In our discussion, we will assume that ` is small enough so that the triangular matrices

L j, j = 1,...,`, and R j, j = 2,...,` + 1, are nonsingular. It follows from (4.1.5) that the range of the matrix

(k) P˜` is the block Krylov subspace

T T T 2 T `−1 K`(AA ,B) = range[P1,AA P1,(AA ) P1,...,(AA ) P1].

˜ (k) Similarly, the range of the matrix Q` is the block Krylov subspace

T T T T T T 2 T T `−1 T K`(A A,A B) = range[A P1,A AA P1,(A A) A P1,...,(A A) A P1].

67 Multiplying the rightmost equation in (4.1.5) by A from the left yields

T T ˜(k) ˜(k) ¯(k) (k) AA P` = P`+1C` C` .

Therefore,

(k)T T (k) (k) (k)T P˜` AA P˜` = C` C` .

T T (k) (k) This suggests that fµ (AA ) may be approximated by evaluating fµ (C` C` ), which is much easier to

T compute than fµ (AA ) when A is large. Let E1 denotes the block vector of appropriate dimensions with blocks of size k ×k, with the first block equal to Ik and all other blocks equal to 0. It follows from results by

Golub and Meurant [34] on the symmetric block that the expression

T T  (k) (k)T  (4.1.6) G` fµ = R1 E1 fµ C` C` E1R1

T T can be interpreted as an `-block Gauss quadrature rule for the approximation of B fµ (AA )B, i.e.,

T T G` f = B f (AA )B ∀ f ∈ P2`−1,

where P2`−1 denotes the set of all polynomials of degree at most 2`−1; see also [24] for related discussions.

We therefore approximate (4.1.3) by

(4.1.7) φ`(µ) = tr(G` fµ )

and let the regularization parameter be the solution of

2 2 (4.1.8) φ`(µ) = η ε .

68 The following result shows that φ`(µ) is decreasing and convex. This makes it convenient to compute the solution µ` of (4.1.8) by Newton’s method; see below.

Proposition 4.1.1. The functions φ(µ) and φ`(µ), defined by (4.1.3) and (4.1.7) for µ > 0, respectively, satisfy

0 00 0 00 φ (µ) < 0, φ (µ) > 0, φ`(µ) < 0, φ` (µ) > 0.

Proof. The derivative of φ(µ) is given by

φ 0(µ) = −2tr(BT (µAAT + I)−3AAT B).

It follows from the identity

(4.1.9) (µAAT + I)−1A = A(µAAT + I)−1

that

φ 0(µ) = −2tr(BT A(µAT A + I)−3AT B).

T T T Substituting the spectral factorization A A = SΛS , SS = I, into the above expression and letting W = [w1,

T T ...,wk] = S A B yields

k 0 T −3 T −3 φ (µ) = −2tr(W (µΛ + I) W ) = −2 ∑ w j (µΛ + I) w j < 0. j=1

Thus, φ(µ) is a decreasing function of µ. Turning to the second derivative, we have

φ 00(µ) = 6tr(BT AAT (µAAT + I)−4AAT B),

and can proceed similarly as above to show that φ 00(µ) > 0.

69 The derivative of φ`(µ) is given by

0 T T (k) (k)T (k) −3 (k)T (4.1.10) φ`(µ) = tr(R1 E1 C` (µC` C` + I) C` E1R1),

(k) 0 00 where we again use the identity (4.1.9) with A substituted by C . The stated properties of φ`(µ) and φ` (µ)

(k)T (k) can be shown by substituting the spectral factorization of C` C` into (4.1.10).

Since φ`(µ) is decreasing and convex, Newton’s method converges monotonically and quadratically to the solution µ` of (4.1.8) for any initial approximate solution µinit < µ`. This makes it easy to implement the

Newton method. For instance, we may use µinit = 0 when φ` and its derivative are suitably defined at

µ = 0; see [16] for a detailed discussion of the case when the block size is one. We note that the function

2 µ → φ`(1/µ), which corresponds to the regularization term µkXkF in (4.0.6), is not guaranteed to be

2 convex. Therefore, Newton’s method has to be safeguarded when applied to the solution of φ`(1/µ) = ε .

This is the reason for considering Tikhonov regularization of the form (4.0.6).

Proposition 4.1.2. Let PN (M) denote the orthogonal projector onto the null space N (M) of the matrix M.

Then

T T φ(0) = tr(B B), lim φ(µ) = tr(B P (AAT )B), µ→∞ N

T T T φ`(0) = tr(B B), lim φ`(µ) = tr(R1 E1 P (R RT )E1R1). µ→∞ N ` `

Proof. The value at zero and limit of φ follow from (4.1.3). The expression (4.1.6) and the definition of the upper triangular matrix R1 yield

 T T  (k) (k)T   φ`(0) = tr R1 E1 f0 C` C` E1R1

T = tr(R1 R1)

= tr(BT B).

70 The last step uses the property B = P1R1. The result for φ`(µ) as µ → ∞ follows similarly as for φ.

Let the regularization parameter µ` be computed by Newton’s method. We then determine the correspond- ing approximate solution by projecting the normal equations (4.0.7) with µ = µ` onto a smaller space determined by the decompositions (4.1.5). We seek to determine an approximate solution of the form

˜ (k) k`×k` (4.1.11) Xµ` = Q` Yµ` , Yµ` ∈ R ,

by solving the normal equations (4.0.7) with µ = µ` by a Galerkin method,

˜ (k) T T −1 ˜ (k) ˜ (k) T T (4.1.12) (Q` ) (A A + µ` I)Q` Yµ` = (Q` ) A B, which simplifies to

T T ¯(k) ¯(k) −1 ¯(k) (4.1.13) (C` C` + µ` I)Yµ` = C` E1R1.

We compute the solution Y`,µ by solving a least-squares problem for which (4.1.13) are the normal equa- tions

    2

¯(k)  C`  E1R1 (4.1.14) min  Y −   . Y∈Rk`×k`      −1/2    µ` I 0 F

Our reason for computing the solution of (4.1.14) instead of (4.1.13) is that solving the least-squares problem is less sensitive to errors for small values of µ` > 0.

Proposition 4.1.3. Let µ` solve (4.1.8), and let Yµ` solve (4.1.12). Then the associated approximate solution

˜ (k) Xµ` = Q` Yµ` of (4.0.6) satisfies

 T  2 T T ¯(k) ¯(k) kAXµ` − BkF = tr R1 E1 fµ` (C` C` )E1R1 .

71 Proof. Using the expression of X`,µ and applying (4.1.5) shows that

˜ (k) AXµ` − B = AQ` Yµ` − B

˜(k) ¯(k) = P`+1C` Yµ` − P1R1   ˜(k) ¯(k) = P`+1 C` Yµ` − E1R1 ,

where we recall that B = P1R1. It follows from (4.1.13) that

    T −1 T   ˜(k) ¯(k) ˜(k) ¯(k) ¯(k) ¯(k) −1 ¯(k) P`+1 C` Yµ` − E1R1 = P`+1 C` C` C` + µ` I C` − I E1R1 .

¯(k) The identity (4.1.2) with A replaced by C` now yields

 T  2 T T ¯(k) ¯(k) kAXµ` − BkF = tr R1 E1 fµ` (C` C` )E1R1 .

Algorithm 5 The BGKB-Tikhonov method 1: Input: A, B, k, ε, η ≥ 1 2: Compute the QR factorization B = P1R1

3: for ` = 1,2,... until kAXµ` − BkF ≤ ηε do ˜ (k) ˜(k) (k) 4: Determine Q` , P`+1, and C` by the BGKB algorithm 2 2 5: Determine µ` by solving φ`(µ) = η ε via Newton’s method 6: end for

7: Compute Yµ` by solving (4.1.14) ˜ (k) 8: Compute Xµ` = Q` Yµ`

4.2 The GGKB method and Gauss-type quadrature

We discuss the application of the GGKB method to compute an approximate solution of (4.0.6) and review how the method can be used to compute inexpensive upper and lower bounds for the discrepancy (4.0.9).

These bounds help us to determine the regularization parameter. This approach of solving (4.0.6) and

72 determining bounds for the discrepancy has recently been described in [11], where further details can be found. Application of ` steps of the GGKB method to A with initial block vector B determines the lower bidiagonal matrix C¯` described in (1.3.7), as well as the matrices

(k) n2×(`+1)k (k) n2×`k U`+1 = [U1,U2,...,U`+1] ∈ R , V` = [V1,V2,...,V`] ∈ R

n2×k with block columns Ui,Vj ∈ R , where U1 = s1B and s1 > 0 is a scaling factor. Introduce the inner product

2 hF,Gi = tr(FT G), F,G ∈ Rn ×k.

1/2 We have kFkF = hF,Fi . The block columns U1,...,U`+1 are orthonormal with respect to this inner product, and so are the block columns V1,...,V`. Thus,

   1 i = j, hUi,Uji = hVi,Vji =   0 i 6= j.

We assume that ` is small enough so that all nontrivial entries of the matrix C¯` are positive. This is the generic situation. We denote the leading `×` submatrix of C¯` by C`. The matrices determined satisfy

  (k) ¯ A V1,V2,...,V` = U`+1(C` ⊗ Ik),(4.2.1)

T   (k) T A U1,U2,...,U` = V` (C` ⊗ Ik).(4.2.2)

73   T   When A is the operator A defined by (4.0.3), one should replace A V1,V2,...,V` and A U1,U2,...,U`

T on the left-hand sides of (4.2.1) and (4.2.2) by the expressions [A (V1),A (V2),...,A (V`)] and [A (U1),

T T A (U2),...,A (U`)], respectively. Consider the functions (of µ):

2 T T −2 G` fµ = kBkF e1 (µC`C` + I`) e1,(4.2.3)

2 T T −2 R`+1 fµ = kBkF e1 (µC¯`C¯` + I`+1) e1.(4.2.4)

The function (4.2.3) can be interpreted as an `-point Gauss quadrature rule for the approximation of the expression φ(µ) defined by (4.1.3); see e.g., [11, 25]. Similarly, the function (4.2.4) may be regarded as an

(` + 1)-point Gauss–Radau quadrature rule with a fixed node at the origin for the approximation of (4.1.3).

The remainder formulas for Gauss and Gauss–Radau quadrature, together with the observations that the derivatives of even order of the function (4.1.4) are positive and the derivatives of odd order are negative, yield the lower and upper bounds

(4.2.5) G` fµ ≤ φ(µ) ≤ R`+1 fµ ;

see [11] for details. We determine a suitable value of µ and an associated approximate solution of (4.0.6) as follows. For ` ≥ 2, we seek to solve the nonlinear equation

2 (4.2.6) G` fµ = ε

for µ > 0 by Newton’s method. One can show similarly as in Section 4.1 that the function µ → G` fµ is decreasing and convex. Therefore, assuming that a solution µ = µ` of (4.2.6) exists and that the initial ap- proximate solution µinit ≥ 0 is smaller than µ`, Newton’s method converges quadratically and monotonically to µ`. If there is no solution, then we increase `. Generally, equation (4.2.6) has a solution already for small values of `.

74 If the solution µ` of (4.2.6) satisfies

2 2 (4.2.7) R`+1 fµ` ≤ η ε ,

then it follows from (4.2.5) that there is a solution Xµ` of (4.0.6) such that

ε ≤ kB − AXµ` kF ≤ ηε.

If (4.2.7) does not hold for µ`, then we carry out one more GGKB steps and solve (4.2.6) with ` replaced by

` + 1. Generally, the bound (4.2.7) can be satisfied already for small values of `, because for any µ > 0, we have generically the inequalities

G`−1 fµ < G` fµ < φ(µ) < R`+1 fµ < R` fµ ;

see [49] for a proof. Assume now that (4.2.7) holds for µ = µ`. We then compute the approximate solu- tion

(k) (4.2.8) Xµ`,` = V` (yµ` ⊗ Ik)

of (4.0.6), where yµ` solves

¯T ¯ −1 ¯T (4.2.9) (C` C` + µ` I`)y = d1C` e1, d1 = kBkF .

These are the normal equations associated with the least-squares problem

 

1/2 ¯ µ` C`   1/2 (4.2.10) min  y − d1µ` e1 . y∈R`   I` F

75 We compute yµ` by solving this least-squares problem instead of the normal equations (4.2.9) because this is beneficial numerically, in particular when µ` > 0 is small.

Proposition 4.2.1. Let µ` solve (4.2.6) and let yµ` solve (4.2.10). Then the associated approximate solution

(4.2.8) of (4.0.6) satisfies

2 kAXµ`,` − BkF = R`+1 fµ` .

Proof. The representation (4.2.8) and (4.2.1) show that

(k) ¯ (k) ¯ AXµ`,` = U`+1(C` ⊗ Ik)(yµ` ⊗ Ik) = U`+1(C`yµ` ⊗ Ik).

Using the above expression gives

2 (k) (k) ¯ 2 kAXµ`,` − BkF = kU`+1(d1e1 ⊗ Ik) −U`+1(C`yµ` ⊗ Ik)kF

¯ 2 = k(d1e1 ⊗ Ik) − (C`yµ` ⊗ Ik)kF

¯ 2 = kd1e1 −C`yµ` kF ,

where we recall that d1 = kBkF . We now express yµ` with the aid of (4.2.9), and apply the identity (4.1.2) with A replaced by C¯`, to obtain

2 2 ¯ ¯T ¯ −1 −1 ¯T 2 kAXµ`,` − BkF = d1 ke1 −C`(C` C` + µ` I`) C` e1kF

2 T T −2 = d1 e1 (µ`C¯`C¯` + I`+1) e1

= R`+1 fµ` .

The following algorithm outlines the main steps for computing µ` and Xµ`,` that satisfy (4.0.9).

76 Algorithm 6 The GGKB-Tikhonov method 1: Input: A, B, k, ε, η ≥ 1 2: Let U1 := B/||B||F

3: for ` = 1,2,... until kAXµ`,` − BkF ≤ ηε do (k) (k) ¯ 4: Determine U`+1, V` , C`, and C` by the GGKB algorithm 2 5: Determine µ` that satisfies G` fµ = ε with Newton’s method 6: end for

7: Determine yµ` by solving (4.2.10) (k) 8: Compute Xµ`,` = V` (yµ` ⊗ Ik)

4.3 Golub–Kahan bidiagonalization for problems with multiple right-hand sides

We may consider (4.0.5) as k linear discrete ill-posed problems that have the same matrix A and different right-hand side vectors b(1),...,b(k); cf. (4.0.10). The solution of linear systems of equations with multiple right-hand sides that might not be known simultaneously and a matrix that stems from the discretization of a well-posed problem has received considerable attention in the literature; see e.g., [19, 22, 43, 51, 64] and references therein. However, the solution of linear discrete ill-posed problems with multiple right-hand sides that might not be available simultaneously has not. The method described in this section is based on the analysis and numerical experience reported in [32], where it is shown that it often suffices to apply only a few steps of (standard) Golub–Kahan bidiagonalization (GKB) to a matrix A of a linear discrete ill- posed problem to gain valuable information of subspaces spanned by the right and left singular vectors of A associated with the dominant singular values. Consider the first system of (4.0.10),

(4.3.1) Ax(1) = b(1),

(1) (1) where the right-hand side is the sum of an unknown error-free vector btrue and an error-vector e . Thus,

(1) (1) (1) (1) (1) (1) b = btrue +e . A bound ke k ≤ ε is assumed to be known. Let xtrue denote the first column of the ma-

(1) trix Xtrue in (4.0.4). We seek to compute an approximation of xtrue by using (standard) partial Golub–Kahan bidiagonalization (GKB) of A with initial vector b(1). To explain some properties of the bidiagonalization

77 computed, recall the SVD of A, (1.2.1),

A = Uˆ ΣˆVˆ T where Uˆ ,Vˆ ∈ Rn2×n2 are orthogonal matrices and

n2×n2 Σˆ = diag[σˆ1,σˆ2,...,σˆn2 ] ∈ R , σˆ1 ≥ σˆ2 ≥ ... ≥ σˆr > σˆr+1 = ... = σˆn2 = 0.

Here r is the rank of A. Let 1 ≤ s ≤ r and let Uˆs and Vˆs consist of the first s columns of Uˆ and Vˆ , re- spectively. Moreover, Σˆ s denotes the leading s × s principal submatrix of Σˆ . This gives the best rank-s approximation

T As = UˆsΣˆ sVˆs of A in the spectral and Frobenius norms. The computation of the full SVD (1.2.1) is too expensive for large-scale problems without a particular structure to be practical. The computation of a partial GKB is much cheaper. Application of ` steps of GKB yields the decompositions (1.3.8) where the matrices P`+1 = [p1,...,

n2×(`+1) n2×(`+1) p`, p`+1] ∈ R and Q`+1 = [q1,...,q`,q`+1] ∈ R with orthonormal columns and P` consists of

(`+1)×` the first ` columns of P`+1. Further, C¯` ∈ R is lower bidiagonal and C` is the leading ` × ` submatrix of C¯`. We apply reorthogonalization of the columns of P`+1 and Q` to secure their numerical orthogonality.

As we saw in Chapter 2, for sufficiently many steps `, the spaces range(P`+1) and range(Q`) contain to high accuracy the subspaces range(Uˆs) and range(Vˆs), respectively, for s ≥ 1 fixed and not too large. Computed examples in Section 2.4 indicate that it often suffices to choose ` ≤ 3s. This result suggests that we can use the same decomposition (1.3.8) for several right-hand side vectors b( j). This is due to the fact that the partial SVD of A is independent of the right-hand side vectors. Consider the Tikhonov regularization problem

(1) 2 2 T (1) 2 2 (4.3.2) min {kAx − b k + µkxk } = min{kC¯`y − P b k + µkyk }, 2 2 ` `+1 2 2 x∈range(Q`) y∈R

78 where x = Q`y. We determine the regularization parameter µ > 0 so that the computed solution yµ satisfies the discrepancy principle

T (1) (1) (4.3.3) kC¯`yµ − P`+1b k2 = ηε .

If no such µ-value exists, then we increase ` by one and try to solve (4.3.3) with ` replaced by `+1 in (4.3.2) and (4.3.3). The small least-squares problem on the right-hand side of (4.3.2) is solved by first expressing it in a form analogous to (4.2.10); see [16] for discussions on the solution of (4.3.2) and on properties of the

T (1) (1) computed solution. We remark that the vector P`+1b can be simplified to e1kb k2. The solution yµ of

(1) (4.3.2) determines the approximate solution xµ = Q`yµ of (4.3.1). We turn to the problem

(4.3.4) Ax(2) = b(2)

and compute an approximate solution by solving (4.3.2) with the vector b(1) replaced by b(2). The vector

T (2) P`+1b has to be explicitly computed. Therefore it is important that the columns of the matrix P`+1 are numerically orthonormal, hence we carry out reorthogonalization. If no µ > 0 can be determined so that

(4.3.3) can be satisfied with b(1) replaced by b(2), then we carry out one more step of Golub–Kahan bidi- agonalization (1.3.8); otherwise, we compute the solution yµ of (4.3.2) with the available decomposition.

(2) Let µ be such that the discrepancy principle holds. Then we obtain the approximate solution xµ = Q`yµ of (4.3.4). We proceed in the same manner to solve Ax(i) = b(i) for i = 3,4,...,k. We will compare this algorithm and Algorithms 5 and 6 to the following “trivial” method that is based on solving each one of the linear discrete ill-posed problems (4.0.10) independently with the aid of (standard) Golub–Kahan bidiago- nalization. Thus, we apply Algorithm 7 with block size one to each one of the k linear discrete ill-posed problems (4.0.10) independently. We refer to this scheme as Algorithm 8 and it is summarized below.

We expect it to require the most matrix-vector product evaluations of the methods in our comparison because

79 Algorithm 7 The GKB-Tikhonov method 1: Input: A, k, b(1),b(2),...,b(k), ε(1),ε(2),...,ε(k), η ≥ 1. (1) (1) 2: Let p1 := b /kb k2. T T 3: Compute AQ` = P`+1C¯`, A P` = Q`C` . 4: for i = 1,2,...,k do T (i) 2 2 5: Compute min {kC¯`yµ − P b k + µkyµ k }. ` `+1 2 2 yµ ∈R ¯ T (i) (i) 6: if kC`yµ − P`+1b k2 > ηε then 7: ` := ` + 1. 8: Return to step 5. 9: end if (i) 10: Compute xµ = Q`yµ . 11: end for

Algorithm 8

1: Input: A, k, b(1),b(2),...,b(k), ε(1),ε(2),...,ε(k), η ≥ 1. 2: for i = 1,2,...,k do (i) (i) 3: Let p1 := b /kb k2. T T 4: Compute AQ` = P`+1C¯`, A P` = Q`C` . T (i) 2 2 5: Compute min {kC¯`yµ − P b k + µkyµ k }. ` `+1 2 2 yµ ∈R ¯ T (i) (i) 6: if kC`yµ − P`+1b k2 > ηε then 7: ` := ` + 1. 8: Return to step 5. 9: end if (i) 10: Compute xµ = Q`yµ . 11: end for

80 we compute a new partial standard Golub–Kahan bidiagonalization for each one of the vectors b( j), j = 1,

...,k. Moreover, this method does not benefit from the fact that on many modern computers the evaluation of matrix-block-vector products with a large matrix A does not require much more time than the evaluation of a matrix-vector product with a single vector for small block sizes; see e.g., [29] for discussions on this and related issues.

4.4 Computed examples

This section provides some numerical results to show the performance of Algorithms 5-8 when applied to the solution of linear discrete ill-posed problems with the same matrix and different right-hand sides. The first example applies these algorithms to the solution of linear discrete ill-posed problems with several right-hand sides defined by matrices that stem from Regularization Tools by Hansen [41], while the latter examples discuss the restoration of RGB images that have been contaminated by blur and noise. All computations were carried out using the MATLAB environment on an Pentium(R) Dual-Core CPU T4200 computer with 4

GB of RAM. The computations were done with approximately 15 decimal digits of relative accuracy.

Example 4.4.1. We would like to solve linear discrete ill-posed problems (4.0.10) with the matrix A ∈

R702×702 determined by the function phillips in Regularization Tools [41]. The matrix is a discretization of a Fredholm integral equation of the first kind that describes a convolution. The function phillips also

(1) 702 (1) 702 determines the error-free data vector btrue ∈ R and the associated error-free solution xtrue ∈ R . The

(i) 702 (i) (i−1) other error-free data vectors btrue ∈ R , i = 2,...,k, are obtained by setting xtrue = xtrue + y/2 for i = 2,

...,k, where y is a vector obtained by discretization of a function of the form α cos(βt) + γ, where α, β, and γ are scalars. For the present example, we let α = 1/2, β = 1/3 and γ = 1/4. The error-free right

(i) (i) (i) 702 hand sides are obtained by letting btrue = Axtrue for i = 2,...,k. A noise vector e ∈ R with normally

(i) distributed random entries with zero mean is added to each data vector btrue to obtain the error-contaminated data vectors b(i), i = 1,...,k, in (4.0.10). The error-vectors e(i) are scaled to correspond to a specified noise level. This is simulated with

(i) (i) (i) e := δekbtruek2e ,

81 (i) 702 where δe is the noise level, and the vector e ∈ R has normally distributed random entries with mean zero and variance one. When the data vectors b(i), i = 1,...,k, are available sequentially, the linear discrete ill-posed problems (4.0.10) can be solved one by one by Algorithms 7 or 8. If the data vectors are available simultaneously, then Algorithms 5 and 6 also can be used to solve (4.0.10). The latter algorithms require that the noise level for each discrete ill-posed problem (4.0.10) is about the same. This is a reasonable assumption for many applications. Table 1 compares the number of matrix-vector product evaluations and the CPU time required by Algorithms 5-8 for k = 10 and noise-contaminated data vectors b(i) corresponding to the noise levels δe = 10−2 and δe = 10−3. For the discrepancy principle, we chose η = 1.1. The displayed relative error in the computed solutions is the maximum error for each one of the k problems (4.0.10). The number of matrix-vector products (MVP) shown is the number of matrix-vector product evaluations with A and AT with a single vector. Thus, each iteration step of Algorithms 5 and 6 adds 2k matrix-vector product evaluations to the count. The number of matrix-vector product evaluations does not give an accurate idea of the computing time required. We therefore also present timings for the algorithms. The Tables 2 and 3 are analogous to Table 1. They differ from the latter in that the matrix A ∈ R702×702 is determined by the function baart for Table 2 and the function shaw for Table 3. The vectors b(i) are determined analogously as for the phillips test problem. Both functions are from [41] and compute discretizations of Fredholm integral equations of the first kind. Tables 1-3 show Algorithm 7 to require the fewest matrix-vector product

Noise level Method MVP Relative error CPU-time (sec) Algorithm 5 100 1.46 × 10−2 3.87 Algorithm 6 200 1.31 × 10−2 7.63 10−3 Algorithm 7 16 2.28 × 10−2 1.52 Algorithm 8 162 1.43 × 10−2 13.22 Algorithm 5 80 2.54 × 10−2 3.08 Algorithm 6 120 2.61 × 10−2 4.67 10−2 Algorithm 7 10 2.52 × 10−2 1.01 Algorithm 8 140 2.60 × 10−2 11.50

Table 1: Results for the phillips test problem evaluations and to give approximate solutions of comparable or higher quality than the other algorithms.

Algorithms 6 and 8 require about the same number of matrix-vector product evaluations, but the former

82 Noise level Method MVP Relative error CPU-time (sec) Algorithm 5 40 4.27 × 10−2 1.61 Algorithm 6 80 5.62 × 10−2 3.31 10−3 Algorithm 7 8 5.20 × 10−2 0.81 Algorithm 8 80 5.46 × 10−2 7.11 Algorithm 5 40 5.02 × 10−2 1.51 Algorithm 6 60 7.36 × 10−2 2.57 10−2 Algorithm 7 8 5.77 × 10−2 0.83 Algorithm 8 62 6.78 × 10−2 5.63

Table 2: Results for the baart test problem

Noise level Method MVP Relative error CPU-time (sec) Algorithm 5 100 5.20 × 10−2 3.88 Algorithm 6 200 4.42 × 10−2 7.57 10−3 Algorithm 7 14 3.98 × 10−2 1.30 Algorithm 8 184 4.72 × 10−2 14.93 Algorithm 5 40 1.82 × 10−1 1.52 Algorithm 6 100 1.55 × 10−1 4.36 10−2 Algorithm 7 10 1.27 × 10−1 0.98 Algorithm 8 100 1.55 × 10−1 8.84

Table 3: Results for the shaw test problem algorithm demands less CPU time because it implements a block method.

Example 4.4.2. This example illustrates the performance of Algorithms 5 - 8 when applied to the restoration of 3-channel RGB color images that have been contaminated by blur and noise. The corrupted image is stored in a block vector B with three columns. The desired (and assumed unavailable) image is stored in the block vector Xtrue with three columns. The blur-contaminated, but noise-free image associated with Xtrue, is stored in the block vector Btrue. The block vector E represents the noise in B, i.e., B := Btrue + E. We define the noise level ||E|| ν = F . ||Btrue||F

To determine the effectiveness of our solution methods, we evaluate the relative error

||X − X || Relative error = true µ` F , ||Xtrue||F

83 where Xµ` denotes the computed restoration. We consider the within-channel blurring only. Hence the blurring matrix A3 in (4.0.3) is the 3 × 3 identity matrix. The blurring matrix A in (4.0.3), which describes the blurring within each channel, models Gaussian blur and is determined by the following Gaussian PSF,

1 x2 + y2 2 h (x,y) = exp − . σ 2πσ 2 2σ

The blurring matrix A is a symmetric block with Toeplitz blocks. It is generated with the

MATLAB function blur from [41]. This function has two parameters, the half-bandwidth of the Toeplitz blocks r and the variance σ of the Gaussian PSF. For this example we let σ = 2 and r = 4. The original

(unknown) RGB image Xtrue ∈ 256 × 256 × 3 is the papav256 image from MATLAB. It is shown in the left-hand side of Figure 1. The associated blurred and noisy image B = AXtrue + E is shown on the right- hand side of the figure. The noise level is ν = 10−3. Given the contaminated image B, we would like to recover an approximation of the original image Xtrue. Table 4 compares the number of matrix-vector product evaluations, the computing time, and the relative errors in the computed restorations. We use the discrepancy principle with η = 1.1 to determine the regularization parameter. The restoration obtained with

Algorithm 5 for noise level ν = 10−3 is shown in the left-hand side of Figure 2. The discrepancy principle for this algorithm and this noise level is satisfied after ` = 70 steps of the BGKB method. This corresponds to 3×2×70 matrix-vector product evaluations. The restoration determined by Algorithm 6 is shown on the

Noise level Method MVP Relative error CPU-time (sec) Algorithm 5 420 4.44 × 10−2 9.52 Algorithm 6 396 4.46 × 10−2 6.30 10−3 Algorithm 7 68 2.81 × 10−1 3.76 Algorithm 8 628 4.16 × 10−2 15.02 Algorithm 5 102 6.72 × 10−2 1.88 Algorithm 6 90 6.71 × 10−2 1.38 10−2 Algorithm 7 18 2.88 × 10−1 0.77 Algorithm 8 124 6.57 × 10−2 2.53

Table 4: Results for Example 2 right-hand side of Figure 2. The GGKB method requires ` = 66 steps to satisfy the discrepancy principle.

84 Figure 1: Example 2: Original image (left), blurred and noisy image (right)

Figure 2: Example 2: Restored image by Algorithm 5 (left), and restored image by Algorithm 6 (right).

85 Figure 3: Example 3: Cross-channel blurred and noisy image (left), restored image by Algorithm 6 (right).

Algorithm 7 is the fastest, but yields restorations of lower quality than the other algorithms for this example.

Example 4.4.3. The previous example illustrated the restoration of an image that has been contaminated by noise and within-channel blur, but not by cross-channel blur. This example shows the restoration of an image that has been contaminated by noise, within-channel blur, and cross-channel blur. We use the same within-channel blur as in Example 2. The cross-channel blur is defined by the cross-channel blur matrix

   0.7 0.2 0.1        A3 =   0.25 0.5 0.25       0.15 0.1 0.75

T from [42]. The blurred and noisy image is represented by B = AXtrueA3 + E, where the noise level is

ν = 10−3. It is shown on the left-hand side of Figure 3. We restore this image using Algorithm 6 with

η = 1.1. The computation of the restored image, shown on the right-hand side of Figure 3, requires ` = 132 steps of the BGKB algorithm. The discrepancy principle yields the regularization parameter µ = 4.62×103 and the relative error in the restoration is 4.61 × 10−2.

86 4.5 Conclusion

This chapter discusses four approaches to the solution of linear discrete ill-posed problems with multiple right-hand sides. Algorithm 8 is clearly the least attractive of the algorithms considered. The relative merits of the other algorithms depends on how accurately the noise level is known, whether the noise- contamination of all data vectors b(i), i = 1,...,k, correspond to about the same noise level and on the computer architecture.

87 BIBLIOGRAPHY

[1] A. M. ABDEL-REHIM,R.B.MORGAN, AND W. WILCOX, Improved seed methods for symmetric positive definite linear equations with multiple right-hand sides, Numer. Appl., 21 (2014), pp. 453–471.

[2] W. E. ARNOLDI, The principle of minimized iteration in the solution of the matrix eigenvalue problem, Quart. Appl. Math., 9 (1951), pp. 17–29.

[3] M. L. BAART, The use of auto-correlation for pseudo-rank determination in noisy ill-conditioned linear least-squares problems, IMA J. Numer. Anal., 2 (1982), pp. 241–247.

[4] J. BAGLAMA,D.CALVETTI, AND L.REICHEL, irbl: An implicitly restarted block Lanczos method for large-scale Hermitian eigenproblems, SIAM J. Sci. Comput., 24 (2003), pp. 1650–1677.

[5] J. BAGLAMA,D.CALVETTI, AND L.REICHEL, Algorithm 827: irbleigs: A MATLAB program for computing a few eigenpairs of a large sparse , ACM Trans. Math. Software, 29 (2003), pp. 337–348.

[6] J. BAGLAMAAND L.REICHEL, Augmented implicitly restarted Lanczos bidiagonalization methods, SIAM J. Sci. Comput., 27 (2005), pp. 19–42.

[7] J. BAGLAMAAND L.REICHEL, Restarted block Lanczos bidiagonalization methods, Numer. Algo- rithms, 43 (2006), pp. 251–272.

[8] J. BAGLAMAAND L.REICHEL, An implicitly restarted block Lanczos bidiagonalization method using Leja shifts, BIT Numer. Math., 53 (2013), pp. 285–310.

[9] J. BAGLAMA,L.REICHEL, AND B.LEWIS, irbla: Fast partial SVD by implicitly restarted Lanczos bidiagonalization, R package, version 2.0.0; see https://cran.r-project.org/package=irlba.

[10] Z.-Z. BAI,A.BUCCINI,K.HAYAMI,L.REICHEL, J.-F. YIN, AND N.ZHENG, A modulus-based iterative method for constrained Tikhonov regularization, J. Comput. Appl. Math., 319 (2017), pp. 1–13.

[11] A. H. BENTBIB,M.EL GUIDE,K.JBILOU, AND L.REICHEL, Global Golub–Kahan bidiagonaliza- tion applied to large discrete ill-posed problems, J. Comput. Appl. Math., 322 (2017), pp. 46–56.

[12] Å. BJÖRCK, Numerical Methods in Matrix Computation, Springer, New York, 2015.

[13] A. BREUER, New filtering strategies for implicitly restarted Lanczos iteration, Electron. Trans. Numer. Anal., 45 (2016), pp. 16–32.

[14] D. CALVETTI, P. C. HANSEN, AND L.REICHEL, L-curve curvature bounds via Lanczos bidiagonal- ization, Electron. Trans. Numer. Anal., 14 (2002), pp. 134–149.

[15] D. CALVETTI,B.LEWIS, AND L.REICHEL, On the choice of subspace for iterative methods for lin- ear discrete ill-posed problems, International Journal of Applied Mathematics and Computer Science, 11 (2001), pp. 1069–1092.

[16] D. CALVETTI AND L.REICHEL, Tikhonov regularization of large linear problems, BIT, 43 (2003), pp. 263–283.

88 [17] D. CALVETTI,L.REICHEL, AND D.C.SORENSEN, An implicitly restarted Lanczos method for large symmetric eigenvalue problems, Electron. Trans. Numer. Anal., 2 (1994), pp. 1–21.

[18] A.S. CARASSO, Determining surface temperatures from interior observations, SIAM J. Appl. Math. 42 (1982), pp. 558–574.

[19] T. F. CHANAND W. L. WAN, Analysis of projection methods for solving linear systems with multiple right-hand sides, SIAM J. Stat. Comput., 18 (1997), pp. 1698–1721.

[20] L. DYKES, F. MARCELLÁN, AND L.REICHEL, The structure of iterative methods for symmetric linear discrete ill-posed problems, BIT Numerical Mathematics, 54 (2014), pp. 129–145.

[21] L. DYKESAND L.REICHEL, A family of range restricted iterative methods for linear discrete ill-posed problems, Dolomites Research Notes on Approximation, 6 (2013), pp. 27–36.

[22] A. EL GUENNOUNI,K.JBILOU, AND H.SADOK, A block version of BiCGSTAB for linear systems with multiple right-hand sides, Electron. Trans. Numer. Anal., 16 (2003), pp. 129–142.

[23] H. W. ENGL,M.HANKE, AND A.NEUBAUER, Regularization of Inverse Problems, Kluwer, Dor- drecht, 1996.

[24] C. FENU,D.MARTIN,L.REICHEL, AND G.RODRIGUEZ, Block Gauss and anti-Gauss quadrature with application to networks, SIAM J. Matrix Anal. Appl., 34 (2013) pp. 1655–1684.

[25] C. FENU,L.REICHEL, AND G.RODRIGUEZ, GCV for Tikhonov regularization via global Golub– Kahan decomposition, Numer. Linear Algebra Appl., 23 (2016), pp. 467–484.

[26] C. FENU,L.REICHEL,G.RODRIGUEZ, AND H.SADOK, GCV for Tikhonov regularization by partial SVD, BIT Numer. Math., in press.

[27] L. FOXAND E. T. GOODWIN, The numerical solution of non-singular linear integral equations, Philos. Trans. Roy. Soc. Lond. Ser. A, Math. Phys. Eng. Sci., 245:902 (1953), pp. 501–534.

[28] N. P. GALATSANOS,A.K.KATSAGGELOS, R. T. CHIN, AND A.D.HILLARY, Least squares restoration of multichannel images, IEEE Trans. Signal Proc., 39 (1991) 2222–2236.

[29] K. GALLIVAN,M.HEATH,E.NG,B.PEYTON,R.PLEMMONS,J.ORTEGA,C.ROMINE,A. SAMEH, AND R.VOIGT, Parallel Algorithms for Matrix Computations, SIAM, Philadelphia, 1990.

[30] S. GAZZOLA, P. NOVATI, AND M.R.RUSSO, Embedded techniques for choosing the parameter in Tikhonov regularization, with Applications, 21 (2014), pp. 796–812.

[31] S. GAZZOLA, P. NOVATI, AND M.R.RUSSO , On Krylov projection methods and Tikhonov regular- ization method, Electron. Trans. Numer. Anal., 44 (2015), pp. 83–123.

[32] S. GAZZOLA,E.ONUNWOR,L.REICHEL, AND G.RODRIGUEZ, On the Lanczos and Golub–Kahan reduction methods applied to discrete ill-posed problems, Numer. Linear Algebra Appl., 23 (2016), pp. 187–204.

[33] G. H. GOLUB, F. T. LUK, AND M.L.OVERTON, A block Lanczos method for computing the singular values and corresponding singular vectors of a matrix, ACM Trans. Math. Software, 7 (1981), pp. 149- 169.

[34] G. H. GOLUBAND G.MEURANT, Matrices, Moments and Quadrature with Applications, Princeton University Press, Princeton, 2010.

[35] G. H. GOLUBAND C. F. VAN LOAN, Matrix Computations, 4th ed., Johns Hopkins University Press, Baltimore, 2013.

89 [36] G. H. GOLUBAND U. VON MATT, Quadratically constrained least squares and quadratic problems, Numer. Math., 59 (1991) 561–580.

[37] M. HANKE, Conjugate Gradient Type Methods for Ill-Posed Problems, Longman, Harlow, 1995.

[38] M. HANKE, On Lanczos based methods for the regularization of discrete ill-posed problems, BIT Numerical Mathematics, 41 (2001), pp. 1008–1018.

[39] P. C. HANSEN, Discrete Inverse Problems: Insight and Algorithms, SIAM, Philadelphia, PA, 2010.

[40] P. C. HANSEN, Rank-Deficient and Discrete Ill-Posed Problems, SIAM, Philadelphia, PA, 1998.

[41] P. C. HANSEN, Regularization tools version 4.0 for Matlab 7.3, Numer. Algorithms, 46 (2007), pp. 189–194.

[42] P. C. HANSEN,J.NAGY, AND D. P. O’LEARY, Deblurring Images: Matrices, Spectra, and Filtering, SIAM, Philadelphia, 2006.

[43] K. JBILOU,H.SADOK, AND A.TINZEFTE, Oblique projection methods for linear systems with mul- tiple right-hand sides, Electron. Trans. Numer. Anal., 20 (2005), pp. 119–138.

[44] Z. JIAAND D.NIU, A refined harmonic Lanczos bidiagonalization method and an implicitly restarted algorithm for computing the smallest singular triplets of large matrices, SIAM J. Sci. Comput., 32 (2010), pp. 714–744.

[45] S. KINDERMANN, Convergence analysis of minimization-based noise level-free parameter choice rules for linear ill-posed problems, Electron. Trans. Numer. Anal., 38 (2011), pp. 233–257.

[46] S. KINDERMANN, Discretization independent convergence rates for noise level-free parameter choice rules for the regularization of ill-conditioned problems, Electron. Trans. Numer. Anal., 40 (2013), pp. 58–81.

[47] C. LANCZOS, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, Journal of Research of the National Bureau of Standards, 45 (1950), pp. 255–282.

[48] F. LI,M.K.NG, AND R.J.PLEMMONS, Coupled segmentation and denoising/deblurring for hyper- spectral material identification, Numer. Linear Algebra Appl., 19 (2012), pp. 15–17.

[49] G. LOPEZ LAGOMASINO,L.REICHEL, AND L.WUNDERLICH, Matrices, moments, and rational quadrature, Linear Algebra Appl., 429 (2008), pp. 2540–2554.

[50] D. R. MARTIN AND L.REICHEL, Minimization of functionals on the solution of a large-scale discrete ill-posed problem, BIT Numer. Math., 53 (2013), pp. 153–173.

[51] J. MENG, P.-Y. ZHU, AND H.-B.LI, A block GCROT(m,k) method for linear systems with multiple right-hand sides, J. Comput. Appl, Math., 255 (2014), pp. 544–554.

[52] K. MORIKUNIAND K.HAYAMI, Convergence of inner-iteration GMRES methods for rank-deficient least squares problems, SIAM J. Matrix Anal. Appl., 36 (2015), pp. 225–250.

[53] K. MORIKUNI,L.REICHEL, AND K.HAYAMI, FGMRES for linear discrete ill-posed problems, Appl. Numer. Math., 75 (2014), pp. 175–187.

[54] V. A. MOROZOV, On the solution of functional equations by the method of regularization, Soviet Math. Dokl., 7 (1966), pp. 414–417.

[55] A. NEUMAN,L.REICHEL, AND H.SADOK, Implementations of range restricted iterative methods for linear discrete ill-posed problems, Linear Algebra Appl., 436 (2012), pp. 3974–3990.

90 [56] S. NOSCHESEAND L.REICHEL, A modified TSVD method for discrete ill-posed problems, Numer. Linear Algebra Appl., 21 (2014), pp. 813–822.

[57] D. P. O’LEARY AND J.A.SIMMONS, A bidiagonalization-regularization procedure for large scale discretizations of ill-posed problems, SIAM J. Sci. Statist. Comput., 2 (1981), pp. 474–489.

[58] E. ONUNWORAND L.REICHEL, On the computation of a truncated SVD of a large linear discrete ill-posed problem, Numerical Algorithms, 75 (2017), pp. 359–380.

[59] C. C. PAIGEAND M.A.SAUNDERS, An algorithm for sparse linear equations and sparse least squares, ACM Transactions on Mathematical Software, 8 (1982), pp. 43–71.

[60] D. L. PHILLIPS, A technique for the numerical solution of certain integral equations of the first kind, J. ACM, 9 (1962), pp. 84–97.

[61] L. REICHELAND G.RODRIGUEZ, Old and new parameter choice rules for discrete ill-posed prob- lems, Numer. Algorithms, 63 (2013), pp. 65–87.

[62] A. RUHE, Implementation aspects of band Lanczos algorithms for computation of eigenvalues of large sparse symmetric matrices, Mathematics of Computation, 33 (1979), pp. 680–687.

[63] Y. SAAD, On the Lanczos method for solving symmetric linear systems with several right-hand sides, Math. Comp., 48 (1987), pp. 651–662.

[64] Y. SAAD, Iterative Methods for Sparse Linear Systems, 2nd ed. SIAM: Philadelphia, 2003.

[65] Y. SAAD, Numerical Methods for Large Eigenvalue Problems, revised edition, SIAM, Philadelphia, 2011.

[66] C. B. SHAW,JR., Improvements of the resolution of an instrument by numerical solution of an integral equation, J. Math. Anal. Appl., 37 (1972), pp. 83–112.

[67] K. M. SOODHALTER, Two recursive GMRES-type methods for shifted linear systems with general preconditioning, Electron. Trans. Numer. Anal., 45 (2016), pp. 499–523.

[68] D. C. SORENSEN, Implicit application of polynomial filters in a k-step Arnoldi method, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 357–385.

[69] A. N. TIKHONOV, Solution of Incorrectly Formulated Problems and the Regularization Method, So- viet Mathematics Doklady, Vol. 4, No. 4, 1963, pp. 1035-1038.

[70] F. TOUTOUNIANAND S.KARIMI, Global least squares method (Gl-LSQR) for solving general linear systems with several right-hand sides, Appl. Math. Comput., 178 (2006), pp. 452–460.

[71] L. N. TREFETHENAND D.BAU, Numerical Linear Algebra, SIAM, Philadelphia, 1998.

[72] L. N. TREFETHENAND M.EMBREE, Spectra and Pseudospectra: The Behavior of Nonnormal Ma- trices and Operators, Princeton University Press, Princeton, 2005.

91