PARAMETER SELECTION RULES FOR ILL-POSED PROBLEMS

A dissertation submitted

to Kent State University

in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

by

Yonggi Park

December 2019

c Copyright

All rights reserved

Except for previously published materials Dissertation written by

Yonggi Park

B.S., University of Central Oklahoma, 2005

M.S., University of Central Oklahoma, 2009

M.S., University of Central Florida, 2013

Ph.D., Kent State University, 2019

Approved by

Lothar Reichel , Co-Chair, Doctoral Dissertation Committee

Alessandro Buccini , Co-Chair, Doctoral Dissertation Committee

Jing Li , Members, Doctoral Dissertation Committee

Jun Li

Kambiz Ghazinour

Austin Melton Jr

Accepted by

Andrew Tonge , Chair, Department of Mathematical Sciences

James L. Blank , Dean, College of Arts and Sciences TABLE OF CONTENTS

TABLE OF CONTENTS ...... iii

LIST OF FIGURES ...... v

LIST OF TABLES ...... vi

ACKNOWLEDGEMENTS ...... vii

NOTATION ...... ix

1 Introduction ...... 1

1.1 Overview ...... 1

1.2 Regularization methods ...... 2

1.2.1 Tikhonov regularization ...... 2

1.2.2 Truncated generalized singular value decomposition (TGSVD) ...... 4

1.2.3 Regularization parameter: the discrepancy principle ...... 5

1.3 Krylov subspace methods ...... 6

1.3.1 Golub-Kahan bidiagonalization ...... 7

1.4 Bregman algorithm ...... 9

1.4.1 The nonstationary modified linearized Bregman algorithm ...... 10

1.5 The test problems ...... 16

1.5.1 Descriptions of the test problems ...... 17

2 Parameter Determination for Tikhonov Regularization Problems in General

Form ...... 21

2.1 A GSVD-based COSE method ...... 22

2.2 Large-scale problems ...... 25

2.3 Numerical example ...... 28

iii 3 Comparison of A-Posteriori Parameter Choice Rules for Linear Discrete Ill-

Posed Problems ...... 41

3.1 The singular value decomposition ...... 43

3.2 Bidiagonalization and quadrature ...... 44

3.2.1 Bidiagonalization ...... 44

3.2.2 Quadrature rules ...... 45

3.3 Computed examples ...... 46

4 Numerical aspects of the Nonstationary Modified Linearized Bregman algo-

rithm ...... 51

4.1 Landweber iteration ...... 51

4.2 Numerical aspects of the NMLB algorithm ...... 53

4.2.1 The number of iterations ...... 55

4.2.2 The residual norm ...... 56

4.2.3 The relative restoration error ...... 59

4.2.4 The choice of δ ...... 59

4.2.5 Final considerations ...... 63

5 Conclusion ...... 64

BIBLIOGRAPHY ...... 66

iv LIST OF FIGURES

1 Test problem Gravity, from [46], with m = n = 40, L = L1 given by (1.5), and ν = 10−2 ...... 35

2 the numerical experiment reported in Figure 1 ...... 36

−2 opt 3 Test problem Phillips, from [46], with m = n = 500, ν = 10 , L = L2, k4 = 7, opt k5 = 5, k4 = 8, k5 =6 ...... 36 −2 opt 4 Test problem Shaw from [46] with m = n = 500, ν = 10 , L = L2, k4 = 6, opt k5 = 4, k4 = 4, k5 =4 ...... 37 5 Exact and computed approximate solutions for the numerical examples of Figures 3

(left) and 4 (right) ...... 37

−2 opt 6 Test problem Deriv2 from [46] with m = n = 500, ν = 10 , L = L2, k4 = 15, opt k5 = 3, k4 = 13, k5 =2...... 38 7 Test problem Deriv2, from [46], with the “sin” solution, m = n = 500, ν = 10−2, opt opt L = L2, k4 = 7, k5 = 5, k4 = 1, k5 =7 ...... 38 8 Exact and computed approximate solutions for the numerical examples of Figures 6

and7 ...... 39

9 Test problem Tomo, from [46], with m = n = 1024, ν = 10−2, L defined by (2.13), opt opt k4 = 386, k5 = 386, k4 = 360, k5 = 360 ...... 39 10 Exact and computed approximate solutions for the numerical example of Figure 9 . 40

11 Number of iterations required to reach convergence for different choices of µ and q . 57

12 Norm of the residual at the final iteration for different choices of µ and q ...... 58

13 RRE obtained for several choices of µ and q. The different graphs display the RRE

versus µ ...... 60

14 RRE for different choices of µ and q ...... 61

v LIST OF TABLES

1 Discretized ill-conditioned test problems used in the numerical experiments . . . . . 29

2 Percentage of numerical experiments that lead to a regularized solution xk such that

(2.12) holds for ρ = 2 (ρ = 5), for TSVD with L = L1 given by (1.5) and different values of φ in (2.11) ...... 30

3 Percentage of numerical experiments that lead to a regularized solution xk such that

(2.12) holds for ρ = 10 (ρ = 100), for TSVD with L = L1 given by (1.5) and different values of φ in (2.11) ...... 31

4 Percentage of numerical experiments that lead to a regularized solution xk such that

(2.12) holds for ρ = 2 (ρ = 5), for TSVD with L = L2 given by (1.6) and different values of φ in (2.11) ...... 32

5 Percentage of numerical experiments that lead to a regularized solution xk such that

(2.12) holds for ρ = 10 (ρ = 100), for TSVD with L = L2 given by (1.6) and different values of φ in (2.11) ...... 33

6 2000 by 2000 : Modified Discrepancy Principle (MD) vs. Discrepancy Principle (D)

using the SVD to determine the regularization parameter λ ...... 48

7 2000 by 2000 : Modified Discrepancy Principle (MD) vs. Discrepancy Principle (D)

using bidiagonalization to determine the regularization parameter λ ...... 49

8 Number of iterations for the baart test problem for two values of δ ...... 62

9 Number of iterations for the heat test problem for two values of δ ...... 62

vi ACKNOWLEDGEMENTS

This work would not have been possible without the wisdom, support, and tireless assistance of my advisor, Lothar Reichel. I genuinely appreciate both his patience with me and the guidance he has given me over the years. His invaluable encouragement and counsel have been critical in facilitating the progress I have made to this point. He has truly been a blessing, and he has made a positive impact on my life. Also, I would like to express my sincere gratitude to my advisor Alessandro

Buccini for the continuous support of my Ph.D study and research, for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. I could not have imagined having a better advisor and mentor for my Ph.D study.

In addition, I extend my undying gratitude to my committee: Jing Li, Jun Li, and Kambiz

Ghazinour. I am tremendously indebted to them for their collective time, effort, and direction.

I would be remiss if I failed to recognize the important contributions made by the following collaborators : Giuseppe Rodriguez, Xuebo Yu. A special thanks to Dr. Giuseppe Rodriguez for the first paper, Parameter Determination for Tikhonov Regularization Problems in General Form, because of providing key contribution. A Special thanks to Dr. Alessandro Buccini for second paper, Numerical aspects of the nonstationary modified linearized Bregman algorithm, because of providing key contribution. A special thanks to Dr. Xuebo Yu for helping me debug my codes and for his valuable input. Another special thanks to Dr. Kambiz Ghazinour and Dr. Austin Melton for becoming my committee members for my defense by spending their precious time.

I honor my parents, Byungyell Park and Sunnim Kim. Their legacy of love, strength, determi- nation, support, and faith imbued me with the courage I needed to achieve this objective, and they will forever endure in my spirit and in my work.

Their stimulating conversations and the familial communion we share sustained and comforted me when I was in need of a respite during challenging moments.

Special thanks to two of my closest friends, Michael Kuian and Hessah Alqahtani for their mathematical insights and constant encouragement.

I will be eternally grateful to them for their emotional support and loving guidance.

vii Thanks to SeongO Chae for his pep talks and for pushing the right motivational buttons.

I am glad that the battle is over. I extend my thanks to the remainder of my family and friends for their unconditional love and support.

Finally, I offer my deepest gratitude to my caring, loving, and supportive wife, Jiyoung. Your consistent encouragement, unending patience, and unflagging faith in me through the rough times have sustained me more than words can express. Thank you so much!

viii NOTATION

Unless stated otherwise, the following notation will be used throughout this dissertation. Standard

notation is used whenever possible.

A an m × n matrix

In n × n identity matrix

k · k the Euclidean vector norm, or the induced operator norm

n X tr(A) the trace of an n × n matrix A is the summation of the diagonal entries, tr(A) = aii i=1

T 1/2 kAkF the Frobenius norm of A defined by kAkF = tr(A A)

hu, vi the inner product between vectors u and v

xexact the exact but unknown true solution

bexact the exact data

e error or noise vector, i.e. the perturbation in the data

ei the i-th standard basis vector of appropriate dimension

AT the transpose of A

A∗ the Hermitian conjugate or Hermitian adjoint of A

A† the Moore-Penrose pseudoinverse of A

A ⊗ B the Kronecker product of matrices A and B

Ai,j the leading principal i × j submatrix of A

R(·) the range or column space

N (·) the nullspace

ix κ(·) the condition number

λ regularization parameter

λi an eigenvalue

A = W˜ ΛW˜ ∗ spectral factorization of the matrix A = AT

SVD singular value decomposition of a matrix

GSVD generalized singular value decomposition

L the p × n regularization matrix

L1 upper bidiagonal regularization matrix, the scaled finite difference approximations of the first derivative operator with first row removed

L2 tridiagonal regularization matrix, the scaled finite difference approximations of the second derivative operator with first row removed

x CHAPTER 1

Introduction

1.1 Overview

We are concerned with the solution of large least-squares problems

m×n m min kAx − bk2,A ∈ R , b ∈ R , m ≥ n, (1.1) x∈Rn with a matrix A, whose singular values gradually decay to zero without a significant gap. Through- out this thesis k · k2 denotes the Euclidean vector norm or the spectral matrix norm. In particular, A is very ill-conditioned and may be rank-deficient. To simplify the notation, we will assume that m ≥ n, but this restriction can be removed. Least-squares problems with a matrix of this kind are commonly referred to as linear discrete ill-posed problems [67]. They arise, for instance, from the discretization of linear ill-posed problems, such as Fredholm integral equations of the first kind with a continuous kernel [40], Z 1 K(s, t)f(t)dt = g(s) (1.2) 0 Here, both the kernel K and the right-hand side g are known functions, while f is the unknown function.

The process of discretization consists of transferring the continuous models and equations into their discrete counterparts. It is used to derive an approximate problem with finitely many un- knowns. The vector b in linear discrete ill-posed problems that arise in applications in science

m and engineering typically represents data that are contaminated by a measurement error e ∈ R . Sometimes we will refer to the vector e as “noise”. Thus,

b = bexact + e, (1.3)

m where bexact ∈ R represents the unknown error-free vector associated with the available vector b. We will assume that this “noise” vector e in (1.3) has normally distributed pseudorandom entries with mean zero and is normalized to correspond to a chosen noise level.

1 Let A† denote the Moore–Penrose pseudoinverse of A. We would like to determine an approxi-

† mation of xexact = A bexact by computing an approximate solution of (1.1). Note that the vector

† † x = A b = xexact + A e typically is not an accurate approximation of xexact because the condition

† † number of A, given by κ(A) = kAk2kA k2, is very large. Generally, kA ek2  kxexactk2, so the value of x can be very far from that of xexact. Due to the ill-conditioning of A, our goal is to reformulate the problem so that the new solution is less sensitive to perturbations. That is, we regularize the problem so that solution becomes more stable.

1.2 Regularization methods

The severe ill-conditioning of A makes the naive solution very sensitive to any perturbation in b.

This is handled by regularization, i.e. replacing the system (1.1) with a nearby system that is less sensitive to the error e in b. Several regularization methods have been developed over the years and they are very effective when utilized to solve linear discrete ill-posed problems. We will consider three of the most common methods : truncated iterations (specifically the truncated singular value decomposition and the truncated eigenvalue decomposition), Tikhonov regularization, and Bregman iteration.

1.2.1 Tikhonov regularization

A widely used method for solving discrete ill-posed problems is the regularization method due to

Tikhonov [77]. The general form solves (1.1) by replacing it with a penalized least squares problem

min kAx − bk2 + λkLxk2 , (1.4) x 2 2

p×n where L ∈ R . The matrix L is called regularization matrix, and we are interested in the situation when L is a fairly general matrix. Many commonly applied regularization matrices are rectangular, and both the cases p ≤ n and p > n arise in applications.

Common choices of the matrix L, when A stems from a uniform discretization of a Fredholm integral equation defined on an interval, are the bidiagonal rectangular matrix   1 −1      1 −1  L =   ∈ (n−1)×n (1.5) 1  . .  R  .. ..      1 −1

2 and the tridiagonal rectangular matrix   −1 2 −1      −1 2 −1  L =   ∈ (n−2)×n (1.6) 2  . . .  R  ......      −1 2 −1

The quantity kAx − bk2 in (1.4) measures goodness-of-fit, as its size determines how the regu- larized solution fits the initial problem. The quantity kLxk2 measures the regularity of the solution. The parameter determination approach of this dissertation can be applied to any regularization

p×n matrix L ∈ R that satisfies N (A) ∩ N (L) = {0}, (1.7) where N (M) denotes the null space of the matrix M.

λ ≥ 0 is called the regularization parameter, and it determines how well the solution xλ of (1.4) approximates xexact and how sensitive xλ is to the error e in the available data vector b. The Tikhonov problem has two alternative formulations, with the normal equation

(AT A + λLT L)x = AT b (1.8) and the linear system (1.8) can be solved stably by solving the equivalent least square problem     A b min √ x − . (1.9) x λL 0

When (1.7) holds, the Tikhonov minimization problem (1.4) has the unique solution

T T −1 T xλ = (A A + λL L) A b (1.10) for any λ > 0, where the superscript T denotes transposition.

Assume for the moment that the norm kek2 > 0 is known and that the (unavailable) linear system of equations

Ax = bexact (1.11) is consistent.

m×n p×n Assume that the matrices A ∈ R and L ∈ R in (1.4) satisfy (1.7) and m ≥ n ≥ p, with m small enough to make the evaluation of the GSVD of the matrix pair (A, L) feasible. Then the

3 GSVD furnishes decompositions of the form   Σ 0     −1 −1 A = U   Z ,L = V M 0 Z , (1.12) 0 In−p

m×n p×p where the matrices U ∈ R and V ∈ R have orthonormal columns, In−p is identity matrix, n×n and Z ∈ R is nonsingular, and the diagonal matrices

p×p p×p Σ = diag[σ1, σ2, . . . , σp] ∈ R ,M = diag[µ1, µ2, . . . , µp] ∈ R have nonnegative diagonal entries ordered according to

0 = σ1 = ··· = σp−` < σp−`+1 ≤ · · · ≤ σp ≤ 1, 1 ≥ µ1 ≥ · · · ≥ µp ≥ 0.

The GSVD (1.12) allows us to express the Tikhonov solution (1.10) in the form

p T n X σi u b X x = i z + (uT b) z . (1.13) λ σ2 + λµ2 i i i i=1 i i i=p+1 The requirement (1.7) secures the existence of the nonsingular matrix Z. We may assume

p×n that the regularization matrix L ∈ R in (1.4) satisfies n ≥ p. If n < p, we compute its QR p×n n×n factorization L = QR, where Q ∈ R has orthonormal columns and R ∈ R is upper triangular, and replace L by R in (1.4).

A discussion on the computation of the GSVD is provided by Bai [3]; see also Golub and Van

Loan [39]. Here the inequality m ≥ n is not imposed. We required this inequality above for ease of exposition. The computation of the GSVD of a pair of matrices of moderate size is quite expensive.

A simplification of the computations that reduces the count of arithmetic floating point operations is described in [28]. Recently, a modification of the decomposition (1.12) aimed to make an analogue of the matrix Z better conditioned has been discussed in [27].

1.2.2 Truncated generalized singular value decomposition (TGSVD)

Truncated GSVD (TGSVD) is a popular regularization method for the solution of discrete ill- posed problems (1.1) when a regularization matrix L 6= I is used; see, e.g., Hansen [44, 45]. Let

U = [u1,..., un] and Z = [z1,..., zn] be the matrices in (1.12). Substituting the decomposition (1.12) of A and L into (1.1) yields the simple minimization problem  

Σ 0 min   y − U T b , (1.14) y∈Rn   0 In−p 2

4 T −1 where y = [y1, y2, . . . , yn] = Z x. We remark that the regularization matrix L affects both the diagonal entries of Σ and the matrix U.

The TGSVD method restricts the solution of the minimization problem (1.14) to vectors y whose p − k first entries, y1, y2, . . . , yp−k, vanish. These components are associated with the p − k smallest diagonal elements of Σ. The parameter k is a discrete regularization parameter.

We obtain the solution of the so restricted minimization problem

T " uT b T # p−k+1 up b T T yk = 0,..., 0, ,..., , up+1b,..., un b , σp−k+1 σp which defines the approximate solution

p T n X ui b X T xk = Zyk = zi + (ui b) zi (1.15) σi i=p−k+1 i=p+1 of the least-squares problem (1.1), where 1 ≤ k ≤ `. The approximate solution xk only depends on the k largest diagonal entries of Σ. The last sum in the right-hand side represents the solution component in N (L).

1.2.3 Regularization parameter: the discrepancy principle

We will now address how to find a dependable and automated method for choosing the regulariza- tion parameter, such as k (for truncated iterations) or λ (for Tikhonov regularization). There are several techniques for choosing this parameter and they include: the discrepancy principle, gener- alized cross validation (GCV), the L-curve criterion, and the normalized cumulative periodogram

(NCP) method; see [31, 33, 34, 45, 55, 71] for discussions of these and other methods for choosing an appropriate regularization parameter. Here we would like to discuss the discrepancy principle introduced by Morozov in [60]. It requires that a bound for the error e in b be known a priori

kek2 ≤  (1.16)

We will apply the discrepancy principle regularization method as follows.

Truncated iterations

For the TGSVD, we find the smallest integer k ≥ 0 such that

kAxk − bk2 ≤ τ, (1.17)

5 where τ ≥ 1 is a user-chosen constant independent of . The more accurate the estimate of our available error bound, the closer we can choose τ to 1. Ideally, we would like to choose k such that kAxk − bk2 = τ but this is rarely satisfied in practice. It can be shown that xk → xexact as kek2 → 0; see [30, 31] for an accessible proof.

Tikhonov regularization

Now we seek λ so that the residual norm is equal to this value. To accomplish this, we solve the following nonlinear equation in terms of λ,

2 2 2 kAxλ − bk2 = τ kek2 (1.18) by Newton’s method for instance. Here τ > 1 is a user-specified constant independent of kek2; see [31,44] for discussions on this parameter choice method. It easily can be shown that there is a unique positive value of λ such that the solution (1.10) of (1.4) satisfies (1.18) for reasonable values of kek2; see below.

1.3 Krylov subspace methods

Linear discrete ill-posed problems like (1.1) are commonly solved with the aid of the singular value decomposition (SVD) of A, if it is a small matrix; see, e.g., [45,63] and references therein. However, it is expensive to compute the SVD of a general large matrix; the computation of the SVD of an n×n matrix requires about 22n3 arithmetic floating-point operations (flops); see, e.g., [39, Chapter 8] for details as well as for flop counts for the situation when m > n. In particular, the SVD of a large general m × n matrix is very expensive to compute. Therefore, large-scale linear discrete ill- posed problems (1.1) are sometimes solved by hybrid methods that first reduce a large least-squares problem to a least-squares problem of small size by a Krylov subspace method, and then solve the latter by using the SVD of the reduced matrix so obtained.

n×n n Given the matrix A ∈ R and the vector b ∈ R , the Krylov subspace generated by A and b is defined by

2 `−1 K`(A, b) = span{b,Ab,A b,...,A b}, ` ≥ 1. (1.19)

A Krylov method seeks an approximate solution to (1.1) in the space (1.19). Krylov subspace methods need only the computations of matrix-vector products with A and do not require any other matrix. As a result, they are very effective when A is very large and sparse. To construct a Krylov

6 sequence, begin with the initial vector, b. We then multiply by A to get the next vector, Ab. This is followed by multiplying that vector by A to get the next vector, A2b, and so on. Hence, the matrix

A2 is not explicitly formed, but the matrix-vector product A2b is evaluated as A(Ab), etc. These vectors are not orthogonal and for relatively small values of ` may become nearly linearly dependent.

We would like to determine an orthonormal basis for a Krylov subspace, as orthonormal bases are easiest to work with. A few well-known Krylov subspace methods generate orthonormal bases.

These methods include the Arnoldi method, the Lanczos method. The Golub–Kahan decomposition method constructs the space;

T T T T T T (`−1) T K`(A A, A b) = span{A b, (A A)A b,..., (A A) A b}, ` ≥ 1.

We discuss this method in details in the following.

1.3.1 Golub-Kahan bidiagonalization

m×n A large nonsymmetric matrix A ∈ R can be approximated by a small by applying a few steps of the Golub–Kahan bidiagonalization (also known as the Lanczos bidiagonal- ization algorithm). This is described by Algorithm 1

7 Algorithm 1: Golub–Kahan Bidiagonalization

1: Input : A, b 6= 0, `

2: Output : σj, ρj, U˜`+1 = [p1,..., p`, p`+1], U`+1 = [q1,..., q`, q`+1]

T 3: Initialize : σ1 = kbk2, p1 = b/σ1, q = A p1, ρ1 = kqk, q1 = q/ρ1

4: for j = 2, . . . , ` + 1 do

5: p = Aqj−1 − ρj−1pj−1

6: σj = kpk

7: if σj = 0 then

8: Stop

9: end if

10: pj = p/σj

T 11: q = A pj − σjqj−1

12: ρj = kqk

13: if ρj = 0 then

14: Stop

15: end if

16: qj = q/ρj

17: end for

Using the vectors pj and qj determined by Algorithm 1, we define the matrices U˜`+1 =

m×(`+1) n×(`+1) [p1,..., p`, p`+1] ∈ R and U`+1 = [q1,..., q`, q`+1] ∈ R with orthonormal columns. T T T These vectors form orthonormal bases for the Krylov subspaces K`(AA , b) and K`(A A, A b), respectively. The scalars ρj and σj computed by the algorithm define the lower bidiagonal matrix

  ρ1 0      σ ρ   2 2   . .   .. ..    (`+1)×` B`+1,` :=   ∈ R . (1.20)    σ`−1 ρ`−1       σ ρ   ` `    0 σ`+1

8 A matrix interpretation of the recursions of Algorithm 1 gives the Golub–Kahan decompositions

˜ T ˜ T AU` = U`+1B`+1,`,A U` = U`B`,`, (1.21) where the leading ` × ` submatrix of B`+1,` is denoted by B`,`. We assume ` is chosen small enough so that the decompositions (1.21) with the stated properties exist. See [6] for a recent discussion of this decomposition.

If we combine the Golub–Kahan decompositions (1.21), we get

T T A AU` = U`+1B`+1,`B`+1,`, (1.22)

T where B`+1,`B`+1,` is a symmetric tridiagonal matrix. Observe that this decomposition is equivalent to applying the Lanczos process to the symmetric positive semidefinite matrix AT A.

1.4 Bregman algorithm

We will consider systems of equations of the form

b = Ax. (1.23)

We will consider the nonstationary modified linearized Bregman (NMLB) algorithm proposed by

Huang et al. [52]. This method is a variant of the modified linearized Bregman (MLB) algorithm described by Cai et al. [16] and is designed to yield faster convergence than the latter. The MLB algorithm is an iterative method for solving the minimization problem   1 2 arg min µ kxk1 + kxk2 : Ax = b , (1.24) x∈Rn 2δ where µ > 0 and 0 < δ < 1/ρ(AT A) are user-supplied constants. Throughout this thesis ρ(M) denotes the spectral radius of the square matrix M, and k · k1 and k · k2 stand for the `1 and `2 vector norms, respectively. In the following, we will refer to µ as the regularization parameter. [11] compares these a-posteriori rules for determining µ when applied to the solution of many linear discrete ill-posed problesms with different amounts of error in the data. The MLB algorithm, which is reviewed in Section 1.4.1, is typically applied when the desired solution xexact is known to be

“sparse,” i.e., to have many zero entries, and we would like to determine an approximation of xexact with the same property. Sparse solutions may be desirable when m  n or when in some basis, such as a framelet system of generator, xexact is known to be sparse.

9 The purpose of the `1-norm in the minimization problem (1.24) is to force the computed solution to be sparse, i.e., to have many vanishing components. The parameter µ ≥ 0 determines the amount of shrinkage. Its choice is important for the performance of the solution methods. This is illustrated in Section 4.2. The `2-norm in (1.24) makes the minimization problem strictly convex. Denote the iterates determined by the MLB algorithm by x1, x2, x3,... . Since we are not interested in the solution A†b of the available system Ax = b, we terminate the iterations before an accurate approximation of this solution has been determined. Specifically, we terminate the iterations as soon as an iterate that satisfies the discrepancy principle has been found, i.e., as soon as

Axk − bε ≤ τε, (1.25) 2 where ε is a bound for the error in bε, which is assumed to be known. Thus,

kek2 ≤ ε. (1.26)

The parameter τ in (1.25) is a user-supplied constant larger than one, that is independent of ε; see, e.g., [31] for details on the discrepancy principle.

In many applications the desired vector xexact represents a signal that is sparse in a suitable basis, such as in the framelet domain. Tight frames have been used in many applications, see, e.g., [12, 16, 17], because many signals of interest have a sparse representation in the framelet domain. We will provide details about the transformation of (1.23) to the framelet domain in

Section 1.4.1.

This chapter is structured as follows: in Section 1.4.1 we recall the main results on the conver- gence of the NMLB algorithm, and Section 4.1 discusses the choice of the parameter δ in (1.24) in the situation when µ = 0. Section 4.2 is concerned with the choice of several parameters, includ- ing δ and the regularization parameter µ, required by the NMLB algorithm. Numerical examples illustrate the performance of the algorithm for different choices of these parameters.

1.4.1 The nonstationary modified linearized Bregman algorithm

This section collects the main results in [52]. We first derive the NMLB algorithm from the linearized

Bregman (LB) algorithm. Then we summarize its theoretical properties and, finally, describe how to combine the LB algorithm with tight frames.

10 m×n Let A ∈ R , with m ≤ n, be a surjective matrix, i.e., all its singular values are positive. We will comment on below how the situation when A has singular values that are numerically vanishing can be handled.

The aim of linearized Bregman iteration is to find an approximation of the solution of (1.23) of minimal `1 norm, i.e., one seeks to solve

min {ksk1 : As = b} . (1.27) s∈Rn Note that this minimization problem is not guaranteed to have a unique solution. The iterations of the LB algorithm can be written as   zk+1 = zk + AT (b − Ask), (1.28) k+1 k+1  s = δSµ(z ),

0 0 for k = 0, 1,... with s = z = 0. Here Sµ(x) denotes the soft-thresholding operator,

Sµ(x) := sign(x)(|x| − µ)+, where all the operations are element-wise and (x)+ := max{0, x} denotes the non-negative part of x ∈ R. The iterations (1.28) can be easily implemented. They require only matrix-vector multiplica- tions, vector additions, scalar multiplication of vectors, and soft-thresholding. Applications of the

LB algorithm include basis pursuit problems, which arise in compressed sensing; see [15,65]. In this, as well as in many other applications of the LB algorithm, the matrix A is sparse or structured, and matrix-vector products can be evaluated cheaply. The algorithm is designed for the approximate solution of problems (1.27) for which the desired solution, xexact, is sparse. It is shown in [14, 25]

k that the limit of the sequence {s }k generated by (1.28) converges to a solution of (1.27). When the matrix A is ill-conditioned, i.e., when the ratio of the largest to smallest singular values of A is large, convergence of the sequence s1, s2,... generated by the LB algorithm may be very slow. Therefore, it may be necessary to carry out many iterations (1.28) until an accurate approximation of xexact has been found. To alleviate this difficulty, Cai et al. [16] proposed the use

m×m of a preconditioner P ∈ R in (1.28). This yields the MLB algorithm,   zk+1 = zk + AT P (b − Ask), (1.29) k+1 k+1  s = δSµ(z ),

11 for k = 0, 1,... , with s0 = z0 = 0.

m×n T −1 Theorem 1 ([16]). Assume that A ∈ R , m ≤ n, is surjective, let P = (AA ) , and let 0 < δ < 1 be a fixed constant. Then the sequence s1, s2,... , generated by the MLB algorithm

(1.29) converges to a solution of (1.24) for any µ > 0. Furthermore, as µ → ∞, the limit of

1 2 the sequence s , s ,... converges to the solution of (1.27) that is closest to the minimal `2-norm solution among all solutions of (1.27).

The main difficulty with the iterations described by the above theorem is that when the matrix

A is ill-conditioned, the preconditioner P = (AAT )−1 may be of very large norm. This may cause numerical difficulties. Moreover, in some applications of interest, the matrix A is rank deficient and then this preconditioner is not defined.

To avoid these difficulties, Cai et al. [16] generalized Theorem 1 to allow the preconditioner P to be an arbitrary symmetric positive definite matrix. This extension is described in the following

m×m theorem. We need the following definition. Let the matrix M ∈ R be symmetric positive definite. Then k · kM denotes the vector norm induced by the matrix M, i.e.,

T 1/2 m kvkM = (v Mv) , v ∈ R .

m×m Theorem 2 ([16]). Let P ∈ R be a symmetric positive definite matrix and assume that 0 < δ < 1/ρ(AT PA). Then the sequence s1, s2,... generated by the iterations (1.29) converges to the unique solution of

 1  arg min µ kxk + kxk2 : x = arg min kAx − bk . x 1 2δ 2 x P

Furthermore, as µ → ∞, the limit of the sequence s1, s2,... converges to the solution of

n o arg min kxk : x = arg min kAx − bk (1.30) x 1 x P of minimal `2-norm among all solutions of (1.30).

Inspired by Tikhonov regularization, Cai et al. [16] considered the application of preconditioners of the form

P = (AAT + αI)−1, (1.31)

12 where α > 0 is a fixed user-specified parameter. With this preconditioner the iterations (1.29) can be written as   zk+1 = zk + AT (AAT + αI)−1(b − Ask), (1.32) k+1 k+1  s = δSµ(z ), for k = 0, 1,... , where s0 = z0 = 0. Theorem 2 yields that the iterates s1, s2,... generated by

(1.32) converge to the unique solution of   1 2 arg min µ kxk + kxk : x = arg min kAx − bk T −1 . x 1 2δ 2 x (AA +αI)

Huang et al. [52] observed that the iterates (1.32) can be sensitive to the choice of α > 0, i.e., the quality of the computed solution may deteriorate significantly when α is chosen slightly off an optimal value. Determining an accurate estimate of the optimal α-value can be difficult, and is not possible in most applications. To circumvent this difficulty, Huang et al. [52] replaced the parameter

α in (1.31) by a sequence of parameter values α0, α1,... , similarly to a strategy suggested in [42]. In other words, the parameter α in (1.32) is changed in each iteration. This defines a nonstationary preconditioning approach. Since ρ(AT PA) < 1 for all α > 0, Huang et al. [52] let δ = 1 in (1.32).

Summarizing, the iterations become  k+1 k T T −1 k  z = z + A (AA + αkI) (b − As ), (1.33) k+1 k+1  s = Sµ(z ), for k = 0, 1,... , where s0 = z0 = 0. This scheme is in [52] referred to as the NMLB algorithm.

The following convergence results are shown by Huang et al. [52].

1 2 Theorem 3. Assume that αk → α¯ as k → ∞ for some 0 < α¯ < ∞. Let s , s ,... denote the iterates determined by (1.33). Then, as k increases, the sk converge to the unique solution of   1 2 arg min µ ksk + ksk : s = arg min kAs − bk T −1 . (1.34) s 1 2 2 s (AA +¯αI)

Furthermore, as µ → ∞, the limit of the iterates sk as k → ∞ is the solution of (1.30), with P given by (1.31) and α replaced by α¯, of minimal `2-norm.

The parameterα ¯ in the above theorem has to be positive for theoretical purposes. It is “tiny” and a lower bound for the αk in the computed examples of Section 4.2. In these examples the αk are a decreasing function of k, and the iterations are terminated well before αk is close toα ¯.

13 Huang et al. [52] illustrate that the iterates determined by the NMLB algorithm with a suitable decreasing parameter sequence α0, α1,... are less sensitive to the choice of the parameters αk than the iterates generated by the MLB algorithm (1.32) are to the choice of the single parameter

α. Moreover, Huang et al. [52] found that the NMLB algorithm may determine more accurate approximations of xexact than the MLB algorithm. The application of the preconditioner P defined by (1.31) is attractive when the matrix AAT is not too large. Then we can explicitly form this matrix, compute the Choleski factorization of

AAT + αI, and use the latter when evaluating matrix-vector products with P . When AAT is large, the preconditioner should be chosen so that it approximates (AAT + αI)−1 in a suitable manner. For instance, in image restoration applications, the matrix A often is a large square block-

Toeplitz-Toeplitz-block matrix. It may then be attractive to approximate the preconditioner (1.31) by a matrix of the form (CCT + αI)−1, where C is a block-circulant-circulant-block matrix that approximates A. Techniques for determining such preconditioners are described in, e.g., [26,62,64].

A recent discussion on how to determine approximations of the preconditioner (1.31) and numerical illustrations are provided by Cai et al. [13]; see Chapter 5 for further comments.

n In many applications the desired solution, xexact, is not sparse in the canonical basis for R , but it is sparse in a framelet system of generator. Framelets are frames with local support. We will review how to combine tight frames and the NMLB algorithm. Applications of tight frames are described, e.g., in [16, 52]. Computed examples with tight frames are presented in Section 4.2.

First, we define tight frames:

r×n n Definition 1. Let W ∈ R with n ≤ r. The set of the rows of W is a tight frame for R if n ∀x ∈ R it holds r 2 X T 2 kxk2 = (wj x) , (1.35) j=1

n T where wj ∈ R is the jth row of W (written as a column vector), i.e., W = [w1, w2,..., wr] . The matrix W is referred to as an analysis operator and W T as a synthesis operator.

Equation (1.35) is equivalent to the perfect reconstruction formula

x = W T y, y = W x.

14 In other words

W is a tight frame ⇔ W T W = I.

Note that, in general, WW T 6= I, unless r = n and the frames are orthogonal.

One of the interesting properties of tight frames is that many signals that arise in applications have a sparse representation in the framelet domain. Since the NMLB algorithm seeks to compute a sparse solution, we would like to modify (1.23) so that the unknowns are framelet coefficients. Let

W denote an analysis operator. Inserting W T W = I into (1.23) and ignoring e in the right-hand side yields the system of equations

AW T W x = b.

Let K = AW T and y = W x. Then the above equation can be expressed as

Ky = b. (1.36)

The entries of the unknown vector y are framelet coefficients of the solution. In many applications the vector y is very sparse. Transformation to the framelet domain allows us to take advantage of the sparsity of solutions computed by the NMLB algorithm, even when the desired solution xexact

n is not sparse in the canonical basis for R . Thus, we first apply the NMLB algorithm to (1.36) to determine the framelet coefficient vector y, and then compute an approximation of xexact by applying the synthesis operator W T to y. Note that, generally, the matrix W is very sparse and, therefore, the evaluation of matrix-vector products with W and W T is very cheap. It follows that the computational cost of the transformation of (1.23) to the framelet domain and back typically is negligible. Note that the preconditioner P is not affected by the transformation to the framelet domain; we have

T −1 T T T −1 (KK + αkI) = (AW (AW ) + αkI)

T −1 = (AA + αkI) .

We turn to the stopping criterion for the NMLB algorithm. From Theorem 3, we know that the limit point of the iterates determined by the NMLB algorithm is a solution of (1.34). However, when the vector b is contaminated by noise and the matrix A is very ill-conditioned (i.e., the ratio of the largest and smallest singular values of A is very large), solutions of (1.34) are not meaningful

15 approximations of xexact. A fairly accurate approximation of xexact often can be determined by terminating the iterations with the NMLB algorithm before convergence is achieved. Huang et al. [52] employed the discrepancy principle to determine when to terminate the iterations (1.33).

Assume that a fairly sharp bound (1.26) for the norm of the error in the data vector b is available.

We then terminate the iterations with the NMLB algorithm when the discrepancy principle (1.17) is satisfied, or equivalently, as soon as an iterate sk satisfies

AW T sk − b ≤ τε. (1.37) 2

Algorithm 2 summarizes the computations of the NMLB method applied to (1.36).

Algorithm 2: The NMLB Algorithm

m×n m 1: Input : A ∈ R , b ∈ R , {αk}k such that αk → α¯ with 0 < α¯ < ∞, µ > 0, τ > 1, and r×n W ∈ R an analysis operator

2: Output : regularized solution x∗

3: z0 = 0, s0 = 0, k = 0

4: repeat

5: k = k + 1

k k−1 T T −1 T k−1 6: z = z + WA (AA + αk−1I) (b − AW s )

k k 7: s = Sµ(z )

8: T k until AW s − b 2 ≤ τε 9: x∗ = W T sk

1.5 The test problems

Most MATLAB codes for determining the discrete ill-posed problems in the computed examples of this thesis stem from Regularization Tools by Hansen [46]. These linear systems were obtained by discretizing Fredholm integral equations of the first kind. We assume that the system matrix

n×n n A ∈ R as well as the exact solution xexact ∈ R are available. If not accessible, the discrete right-hand side is obtained by computing bexact = Axexact.

16 1.5.1 Descriptions of the test problems

baart: The Fredholm integral equation of the first kind

Z π sin(s) π exp(s cos(t))x(t)dt = 2 , 0 ≤ s ≤ , (1.38) 0 s 2 is discussed by Baart [1]. It has the solution x(t) = sin(t). The integral equation is discretized

by a Galerkin method with piece-wise constant test and trial functions using the function baart

from [46]. This gives a nonsymmetric matrix.

deriv2: The Fredholm integral equation of the first kind

Z 1 K(s, t)x(t)dt = g(s), 0 ≤ s, t ≤ 1, (1.39) 0 where the kernel K is Green’s function for the second derivative   s(t − 1), s < t, K(s, t) =  t(s − 1), s ≥ t.

The right-hand side is given by g(s) = (s3 − s)/6 and the solution is x(t) = t. The integral

equation is discretized by a Galerkin method using the MATLAB function baart from [46]. The

matrix produced is symmetric and negative definite. This problem is mildly ill-conditioned,

i.e., its singular values decay slowly to zero. foxgood: This is the Fredholm integral equation of the first kind

Z 1 1 2 2 1 2 3 3 s + t 2 x(t)dt = (1 + s ) 2 − s , 0 ≤ s, t ≤ 1, (1.40) 0 3 with solution x(t) = t, originally discussed by Fox and Goodwin [35]. The function foxgood

from [46] is used to determine a discretization by a Nystr¨ommethod. This gives a symmetric

indefinite matrix that is severely ill-posed and numerically singular.

gravity: A one-dimensional gravity surveying model problem resulting in the first-kind Fredholm in-

tegral equation

− 3 Z 1 1  1  2 + (s − t)2 x(t)dt = g(s), 0 ≤ s, t ≤ 1, (1.41) 0 4 16 and 1 x(t) = sin(πt) + sin(2πt). 2

17 Discretization is carried out by a Nystr¨ommethod based on the midpoint quadrature rule

using the function gravity from [46]. The resulting matrix is symmetric positive definite and

the exact right-hand side is computed as bexact = Axexact.

heat: The inverse heat equation [21] used in this thesis is a Volterra integral equation of the first

kind. The kernel is given by K(s, t) = k(s − t), where

1  1  k(t) = √ exp − . 2t3/2 π 4t

The discretization of the integral equation is done by simple collocation and the midpoint

rule with n points. The matrix produced is a lower-triangular matrix and it is ill-conditioned.

An exact solution is constructed and then the discrete right-hand side is computed as bexact =

Axexact. This is a severely ill-posed problem. i laplace: The Fredholm integral equation of the first kind

Z ∞ 16 exp(−st)x(t)dt = 3 , s ≥ 0, t ≥ 0, (1.42) 0 (2s + 1)

2 t  represents the inverse Laplace transform, with the solution x(t) = t exp − 2 . It is discretized by means of Gauss–Laguerre quadrature using the MATLAB function i laplace from [46]. The

nonsymmetric matrix so obtained is numerically singular.

phillips: We now consider the Fredholm integral equation of the first kind discussed by Phillips [68],

Z 6 K(s, t)x(t)dt = g(t), −6 ≤ s, t ≤ 6, (1.43) −6

where the solution x(t), kernel K(s, t), and right-hand side g(s) are given by  πt   1 + cos 3 , |t| < 3, x(t) =  0, |t| ≥ 3, K(s, t) = x(s − t),  1 πs 9 π|s| g(s) = (6 − |s|) 1 + cos + sin . 2 3 2π 3

The integral equation is discretized by a Galerkin method using the MATLAB function phillips

from [46]. The matrix produced is symmetric and indefinite.

18 shaw: Fredholm integral equation of the first kind discussed by Shaw [?],

π Z 2 π π K(s, t)x(t)dt = g(s), − ≤ s, t ≤ , (1.44) π 2 2 − 2 with kernel sin(π(sin(s) + sin(t)))2 K(s, t) = (cos(s) + cos(t))2 π(sin(s) + sin(t)) and solution

x(t) = 2 exp(−6(t − 0.8)2) + exp(−2(t + 0.5)2),

which define the right-hand side function g. Discretization is carried out by a Nystr¨ommethod

based on the midpoint quadrature rule using the function shaw from [46]. The resulting matrix

is symmetric indefinite and numerically singular. The discrete right-hand side is computed

as bexact = Axexact. This problem is severely ill-posed.

hilbert: A matrix H with elements

−1 Hij = (i + j + 1)

for i, j = 1, 2, ..., n

lotkin: A Hilbert matrix with the first row replaced by ones. It is unsymmetric, ill conditioned,

totally positive, and has many small, negative eigenvalues. The inverse has integer entries

and is known explicitly.

prolate: finite segments An of the singly infinite symmetric Toeplits matrix

  a11 a12 ...     Ainf = a a ... (1.45)  21 22   . . .  . . ..

1 1 with a0 = 2w, ak = (sin 2πwk)/πk for k = 1, 2, ..., and 0 < w < 2 . The choice w = 4 leads 1 1 −1 1 for example to a = [ 2 , π , 0, 3π , 0, 5π , 0, ...].

tomo: a 2-D tomography problem in which the elements of the right-hand b side are line integrals

along straight rays penetrating a rectangular domain, which is discretized into N 2 cells, each

cell with its own intensity stored as the elements of the solution vector x. We use the standard

19 ordering where the elements in x corresponding to a stacking of the columns of the image.

The elements of the coefficient matrix A are given by   `ij, pixelj ∈ rayi aij =  0, else,

where `ij is the length of the ith ray in pixel j. The exact solution is identical to the exact image in the test problem blur. The rays are placed randomly inside the domain, the number

of rays can be specified, and the coefficient matrix A is stored in sparse format. blur: Image deblurring test problem in connection with the degradation of digital images by atmo-

spheric turbulence blur, modelled by a 2-D Gaussian point-spread function:

1 x2 + y2 h(x, y) = e(− ) 2πσ2 2σ2

The matrix A is a symmetric N 2 × N 2 doubly Toeplitz matrix, stored in sparse format, and

given by A = (2πσ2)−1T ⊗ T , where T is an N × N symmetric banded Toeplitz matrix.

Only elements within a distance band - 1 from the diagonal are stored; i.e., band is the half-

bandwidth of the matrix T. The parameter σ controls the shape of the Gaussian point spread

function and thus the amount of smoothing (the larger the σ, the wider the function, and

the less ill posed the problem). The vector x is a columnwise stacked version of a simple test

image, while b holds a columnwise stacked version of the blurrred image; i.e, b = Ax.

20 CHAPTER 2

Parameter Determination for Tikhonov Regularization Problems in General Form

This chapter is concerned with the situation when the discrepancy principle cannot be applied. In this case, it can be quite difficult to determine a suitable value of λ.

We are interested in the common situation when no estimate of kek2 is available. Parameter choice methods for this situation are commonly referred to as “heuristic”, because they may fail in certain situations; see [31]. A large number of heuristic parameter choice methods have been proposed in the literature due to the importance of being able to determine a suitable value of the regularization parameter when the discrepancy principle cannot be used; see, e.g., [4, 8, 18, 22, 33,

34, 47, 51, 55, 56, 71, 72]. These methods include the L-curve criterion, generalized cross validation, and the quasi-optimality principle.

Most heuristics parameter choice methods have been developed for the situation when the regularization matrix L in (1.4) is the identity matrix. We are concerned with the situation when

p×n L ∈ R is a fairly general matrix such that (1.7) holds. It is the purpose of this chapter to extend the Comparison of Solutions Estimator (COSE) method for determining a suitable value for the regularization parameter for Tikhonov regularization problems in standard form described in [51] to Tikhonov regularization problems in general form (1.4).

Section 2.1 describes the COSE method for Tikhonov minimization problems (1.4) that are small enough to allow the computation of the Generalized Singular Value Decomposition (GSVD) of the matrix pair (A, L). The availability of this decomposition makes it easy to solve the Tikhonov minimization problem (1.4) and determine a value of the regularization parameter λ > 0 such that the norm of the residual error kAxλ − bk2 achieves a prescribed value. Moreover, knowledge of the GSVD allows the inexpensive computation of a regularized approximate solution of (1.1) with the aid of the Truncated Generalized Singular Value Decomposition (TGSVD) (1.14); see, e.g., Hansen [44, 45]. Let k ≥ 1 be the truncation index of the TGSVD method and denote the

21 associated approximate solution of (1.1) by xk; see Section 2.1 for details on the definition of xk. Define the associated residual vector

rk = b − Axk.

We consider rk an error-vector and determine the value of the regularization parameter λ = λk in (1.4) so that the associated Tikhonov solution

T T −1 T xλk = (A A + λkL L) A b satisfies (1.17) with τkek2 replaced by krkk2. We then compute the smallest k-value, denoted by kmin, that minimizes k → kxk − xλ k2 and use xk or xλ as approximations of xexact. k min kmin

Computed examples reported in [51] show this approach to compute an approximation of xexact to be competitive with other available methods when L = I.

Section 2.2 is concerned with Tikhonov minimization problems (1.4) with matrices A and L that are too large to make the computation of the GSVD of the matrix pair (A, L) attractive or feasible. The matrices A and L are then first reduced to small or medium size, before the GSVD of the reduced matrices is computed. Several reduction methods are available in the literature; see, e.g., [36,50,54,57,75]. We discuss two methods for reducing Tikhonov regularization problems

(1.4) with large matrices A and L to a Tikhonov regularization problem with small matrices. The methods differ in their handling of N (L). Section 2.3 describes a few computed examples.

2.1 A GSVD-based COSE method

When the GSVD (1.12) of the matrix pair (A, L) is available, the TGSVD and Tikhonov solutions

(1.15) and (1.13), respectively, are inexpensive to evaluate for different values of the regularization parameters k and λ. This is the basis of the COSE method. Introduce the residual norms associated with the TGSVD solutions xk defined in (1.15),

T ρk = kAxk − UU bk2, k = 1, 2, . . . , `. (2.1)

For each k = 1, 2, . . . , `, we determine a Tikhonov solution (1.13) that corresponds to the residual

T error norm ρk. We use the orthogonal projector UU in (2.1) to achieve better performance for inconsistent problems (1.1) and problems with m  n. It follows from (1.12) that the range of A is a subset of the range of U. Therefore,

2 T 2 T 2 kAxk − bk2 = kAxk − UU bk2 + k(I − UU )bk2,

22 and no choice of the regularization parameter k can reduce the last term in the right-hand side.

Using the GSVD (1.12) and lettingτ ˜ = 1/λ, we can express the equation

T 2 2 kAxλ − UU bk2 = ρk as a zero-finding problem for the function

p 4 T 2 X µj (uj b) f(˜τ) = − ρ2. (2.2) (σ2τ˜ + µ2)2 k j=1 j j It can easily be shown that this function has a unique zero. Newton’s method applied to determine this zero can be expressed as    −1 p 4 T 2 p 2 4 T 2 1 X µj (uj b) 2 X σj µj (uj b) τ˜q+1 =τ ˜q +  − ρ  ·   . 2 (σ2τ˜ + µ2)2 k (σ2τ˜ + µ2)3 j=1 j q j j=1 j q j

We remark that when L = I and µj = 1 for all j, and the σj are the singular values of A, the above iterations simplify to those used in [51]. Letτ ˜∗ > 0 denote the computed zero of (2.2). Then

−1 λk =τ ˜∗ gives the value of the regularization parameter for the Tikhonov solution (1.13). Evaluate for k = 1, 2, . . . , ` the quantities

δk = kxλk − xkk2, (2.3) and let kmin denote the index of the minimizer of the sequence δ1, δ2, . . . , δ`. In case the minimizer is not unique, we let kmin be the smallest minimizer. When kmin = 1, 2, we also consider the smallest minimizer of the sequence δ3, δ4, . . . , δ`. If the latter has index k > 3, then we set kmin = k. Our reason for doing this is that the sequence of the δk may exhibit a false local minimum at the very beginning, due to the fact that, e.g., the underregularized vectors xλ1 and x1 may be close to each other, without λ1 being a suitable choice of the regularization parameter λ.

We may use either the TGSVD solution xk or the Tikhonov solution xλ as approximations min kmin of the desired solution xexact of (1.1). When the system (1.11) is consistent and m = n in (1.12), the quantity

T ρk = kAxλ − UU bk2 (2.4) min kmin typically furnishes a quite accurate estimate of the norm of the error e in the data vector b.

The regularization parameters kmin and λkmin determined by the following algorithm typically are appropriate also for inconsistent problems (1.1).

23 Algorithm 3: Comparison of solution estimator for L 6= I.

m×n p×n m 1: Input : Matrices A ∈ R , L ∈ R , data vector b ∈ R

2: Output : TGSVD truncation index kmin, Tikhonov regularization parameter λkmin .

3: Compute the GSVD (1.12) of the matrix pair (A, L)

4: Compute the TGSVD solutions (1.15) xk, k = 1, . . . , `

5: for k = 1, . . . , ` do

T 6: Compute residual norm ρk = kAxk − UU bk2

7: Compute Tikhonov regularization parameter λk by determining the zero of the function (2.2)

8: Compute the Tikhonov solution (1.13)

p n X σj X x = (uT b)x + (uT b)x λk σ2 + λ µ2 j j j j j=1 j k j j=p+1

9: Compute δk = kxλk − xkk2

10: end for

11: kmin = arg mink=1,...,` δk

12: if kmin ≤ 2 then

13: k2 = arg mink=kmin+1,...,kmax δk

14: if k2 > kmin + 1 then

15: kmin = k2

16: end if

17: end if

Algorithm 3 computes the value kmin of the truncation index for the TGSVD method as well as the corresponding value λkmin of the regularization parameter for Tikhonov regularization. In the numerical experiments, we display these parameters as well as the estimate (2.4) of the norm of the error in the data vector b furnished by the computed Tikhonov solution. Computed examples presented in Section 2.3 show this estimate to be quite accurate for many computed examples when the system (1.11) is consistent.

24 2.2 Large-scale problems

It is prohibitively expensive to compute the GSVD of a pair of large matrices. Tikhonov regulariza- tion problems (1.4) with large matrices A and L have to be reduced to problems of small size before the COSE method of Section 2.1 can be applied. Many reduction methods have been described in the literature; see, e.g., [36, 75]. We will show examples with a method described in [50], that first reduces A by applying `  min{m, n} steps of Algorithm 1 with initial vector b.

We assume that ` is chosen small enough so that the decompositions (1.21) with the stated properties exist. The value of ` used in computations, generally, is not large; in particular, `  n.

We seek a solution of (1.4) in the subspace range(U`). Thus, we solve the Tikhonov minimization problem

2 2 min{kAU`y − bk2 + λkLU`yk2}. (2.5) y∈R`

It follows from (1.7) that this problem has a unique solution for any λ > 0.

Introduce the QR factorization

LU` = Q`R`, (2.6)

p×` `×` i.e., the matrix Q` ∈ R has orthonormal columns and R` ∈ R is upper triangular, or upper trapezoidal in case rank(LU`) < `. Here we assume that ` ≤ p. Using the factorization (2.6) and the decompositions (1.21), we can express (2.5) as

2 2 min{kB`+1,`y − kbke1 k2 + λkR`yk2}, (2.7) y∈R`

T where e1 = [1, 0,..., 0] denotes the first axis vector. The matrices in (2.7) are small and the COSE method of Section 2.1 can be applied to this reduced Tikhonov minimization problem. The overall procedure is described in Algorithm 4.

We remark that when an initial number of bidiagonalization steps ` is chosen, and subsequently is increased to be able to compute a more accurate approximation of the desired solution xexact, the QR factorization (2.6) has to be updated.

      R` r` L =   (2.8) U` unew Q` q`   0 rm

25 where

˜rλi unew = ||˜rλi || T ˜rλi = (I − UiUi )rλi T 2 T T rλi = (A AU` + λ L Q`R`)yλi − A b ˜q = q` ` rm q` = Lunew − r`

T r` = Q` (Lunew) rm = ||q`||. Daniel et al. [23] describe efficient formulas for this purpose.

Algorithm 4: COSE for L 6= I for large-scale problems based on partial Golub–Kahan bidiag-

onalization.

m×n p×n m 1: Input : Matrices A ∈ R , L ∈ R , data vector b ∈ R , number of steps `steps, max

number of steps Nmax

2: Output : GK/TGSVD and Tikhonov parameters `min, λ`min , and corresponding regularized

solutions x` , xλ min `min

3: p1 = b/kbk2

4: ` = 0

5: repeat

6: ` = ` + `steps

7: Perform `steps steps of Golub–Kahan (GK) bidiagonalization, obtaining matrices Ue`+1, U`,

and B`+1,`

8: Compute the compact QR factorization of LU`

`+1 9: be = kbk2e1 ∈ R

10: Apply Algorithm 3 to the projected least squares problem min kB`+1,`y − bek2 with

regularization matrix R, obtaining the truncation index `min, the Tikhonov parameter

λ` , and the regularized solutions y` , yλ min min `min

11: until (`min < 0.75 · `) or (` > Nmax) or (GK stops for breakdown)

12: The TGSVD truncation index is `min, the Tikhonov parameter is λ`min

13: x` = U`y` , xλ = U`yλ min min `min `min

26 The solution method described above does not consider N (L) in the choice of the solution subspace. The following, alternate, approach explicitly determines a solution component in N (L).

This component is not damped by L. The approach is applicable when N (L) is known and has fairly small dimension, and guarantees that certain solution features represented by N (L) are not damped.

It has previously been applied in several direct and iterative solution methods [2, 50, 59]. Let the ˘ n×s orthonormal columns of the matrix Ws ∈ R span N (L) and introduce the QR factorization,

AW˘ s = Q˘sR˘s,

˘ n×s ˘ s×s where Qs ∈ R has orthonormal columns and Rs ∈ R is upper triangular. Due to (1.7), the matrix R˘s is nonsingular. Introduce the orthogonal projectors

T ⊥ T T ⊥ T P ˘ = W˘ sW˘ ,P = I − W˘ sW˘ ,P ˘ = Q˘sQ˘ ,P = I − Q˘sQ˘ . Ws s W˘ s s Qs s Q˘s s

⊥ ⊥ Then, using that I = P ˘ + P and P AP ˘ = 0, we obtain Ws W˘ s Q˘s Ws

2 2 ⊥ ⊥ 2 kAx − bk = kP ˘ Ax − P ˘ bk + kP Ax − P bk 2 Qs Qs 2 Q˘s Q˘s 2 ⊥ 2 ⊥ ⊥ ⊥ 2 = kP ˘ AP ˘ x − (P ˘ b − P ˘ AP x)k + kP AP x − P bk . Qs Ws Qs Qs W˘ s 2 Q˘s W˘ s Q˘s 2

Substitution into (1.4) gives

min {kP AP x − (P b − P AP ⊥ x)k2 + kP ⊥ AP ⊥ x − P ⊥ bk2 + λkLxk2}. n Q˘s W˘ s Q˘s Q˘s W˘ 2 Q˘ W˘ Q˘ 2 2 x∈R s s s s

˘ T Let y = Ws x. Then

⊥ T T ⊥ kP ˘ AP ˘ x − (P ˘ b − P ˘ AP x)k2 = kR˘sy − (Q˘ b − Q˘ AP x)k2. (2.9) Qs Ws Qs Qs W˘ s s s W˘ s

⊥ s Since R˘s is nonsingular, we may for any P x determine y ∈ so that the expression in the W˘ s R right-hand side of (2.9) vanishes. This determines the component W˘ sy in N (L) of the solution of (1.4). The solution component in N (L)⊥ is P ⊥ x, where x solves W˘ s

min {kP ⊥ AP ⊥ x − P ⊥ bk2 + λkLP ⊥ xk2}. n Q˘ W˘ Q˘ 2 W˘ 2 x∈R s s s s

We solve this projected minimization problem as described above, i.e., we apply ` steps of Golub–

Kahan bidiagonalization to the matrix P ⊥ AP ⊥ . This yields (an approximation of) the component Q˘s W˘ s ⊥ s in N (L) of the solution x of (1.4), which allows us to determine y ∈ R such that (2.9) vanishes

27 and gives the solution component in N (L). We remark that since P ⊥ AP ⊥ = P ⊥ A, we may omit Q˘s W˘ s Q˘s the projector P ⊥ . The matrix P ⊥ A, of course, does not have to be explicitly formed. This split- W˘ s Q˘s ting of the solution of (1.4) into components in N (L) and N (L)⊥ is attractive when the dimension of N (L) is not large. See Algorithm 5 for a summary of the method.

Algorithm 5: COSE for L 6= I for large-scale problems based on partial Golub–Kahan bidiag-

onalization and the availability of a basis for the null space of L.

m×n p×n n×s 1: Input : Matrices A ∈ R , L ∈ R , U˘ ∈ R whose columns are a basis for N (L), data m vector b ∈ R , number of steps rsteps

2: Output : GK/TGSVD and Tikhonov parameters rmin, λrmin , and corresponding regularized solutions x , x rmin λrmin 3: Orthonormalize the basis for N (L) by the QR factorization WM˘ = U˘

4: Compute the QR factorization Q˘R˘ = AW˘

5: Compute the projection A˘ = A − Q˘(Q˘T A)

6: Compute the projection b˘ = b − Q˘(Q˘T b)

7: Apply Algorithm 4 with input A˘, L, b˘, and number of steps rsteps, obtaining the truncation index r , the Tikhonov parameter λ , and the regularized solutions z , z . The min rmin rmin λrmin matrix A˘, of course, is not explicitly formed. ˘ ˘T 8: Solve the triangular linear system Ry = Q (b − Azrmin )

9: z = W˘ y

10: The TGSVD truncation index is rmin, the Tikhonov parameter is λrmin

11: x = z + z, x = z + z rmin rmin λrmin λrmin

2.3 Numerical example

In this section we investigate the performance of the proposed methods by means of selected ill- conditioned test problems, listed in Table 1. Most of them are contained in Hansen’s Regularization

Tools [46], except for the matrices Hilbert, Lotkin, and Prolate, which are constructed with the gallery function of MATLAB. Each problem from [46] is associated to a model solution xexact; for the gallery examples, we use the solution of the problem Baart from [46]. MATLAB functions that implement the algorithms described in [66], as well as algorithms from [51], are available at

28 the authors’ home pages; see, e.g., http://bugs.unica.it/~gppe/soft/.

Table 1: Discretized ill-conditioned test problems used in the numerical experiments. Baart Deriv2(2) Foxgood Gravity Heat(1)

Hilbert Lotkin Phillips Prolate Shaw

For each test problem, we first determine the noise-free data vector as bexact = Axexact; then the associated perturbed data vector b is obtained by ν b = b + √ kb k w, exact n exact 2 where w is a vector whose components are normally distributed with zero mean and unit variance, and ν is the noise level.

Figure 1 illustrates the performance of Algorithm 3. We consider the test problem Gravity from [46] with m = n = 40. The regularization matrix is chosen to be the discrete approximation

−2 of the first derivative L1, defined in (1.5), and the noise level is ν = 10 . For each value of the TGSVD truncation parameter k = 1, 3,..., 11, we plot the exact solution xexact, the TGSVD solution xk (1.15) and the associated Tikhonov solution xλk (1.13). The last vector is obtained by minimizing the function (2.2) by Newton’s method.

In this numerical example, as it happens in the majority of cases, the vectors xk and xλk are closest to each other when they best approximate the solution xexact. The minimum of the quantities δk (2.3) is achieved for k = 5, which also produces the least Euclidean norm error. This is shown by Figure 2, which displays the values of the error

kxk − xexactk2 (2.10) and δk as functions of k. To compare Algorithm 3 to other well-known methods for the estimation of the truncation parameter in TGSVD, we replicate for the new algorithm Experiment 4.1 from [51]. For each test problem in Table 1, we construct two “square” discretizations, of size n = 40 and n = 100, respectively, and two “rectangular” ones, of size 80 × 40 and 200 × 100. The noise level is set to the values ν = 10−3, 10−2, 10−1, which are compatible with real-world applications, and each normally distributed noise vector w is generated 10 times. This procedure produces 600 square linear systems and 600 rectangular consistent systems.

29 Table 2: Percentage of numerical experiments that lead to a regularized solution xk such that (2.12) holds for ρ = 2 (ρ = 5), for TSVD with L = L1 given by (1.5) and different values of φ in (2.11). square rectangular systems

method systems φ = 0 φ = 1 φ = 10

COSE 17%( 2%) 19%( 4%) 19%( 4%) 22%( 5%)

L-corner [47] 36%(18%) 39%(22%) 79%(59%) 84%(62%)

Res L-curve [73] 51%(30%) 63%(35%) 81%(64%) 83%(68%)

Regi´nska [70] 58%(30%) 32%(11%) 56%(36%) 46%(28%)

ResReg [71] 36%( 4%) 27%( 5%) 56%(36%) 46%(28%)

Quasiopt [46] 44%(22%) 34%(10%) 31%(10%) 27%( 9%)

GCV [46] 49%(41%) 25%(13%) 43%(18%) 43%(21%)

Extrapolation [8] 61%(13%) 58%(19%) 56%(36%) 46%(28%)

Discrepancy [31] 23%( 1%) 41%( 3%) 69%(43%) 68%(49%)

To investigate the behavior of the methods also for inconsistent linear systems, we introduce a vector q that is orthogonal to the range of the over-determined matrix A. A multiple φ of q is then added to the right-hand side of the consistent system, to obtain

bexact = Axexact + φ q. (2.11)

By repeating the above process with φ = 1 and φ = 10, we construct two sets of 600 rectangular inconsistent linear systems.

Let kbest be the truncation index that gives the least Euclidean error norm

Ebest = kxk − xexactk2 = min kxk − xexactk2. best k

In Table 2 we record the percentage of numerical experiments that Algorithm 3, as well as a set of competing methods, produced an error larger than a certain multiple, ρ > 1, of the best error

Ebest. The methods considered besides COSE are well known; we give some references to the particular implementation we used: L-corner [47], Residual L-curve [73], Regi´nska criterion [70],

Restricted Regi´nska criterion [71], Quasi-optimality [46], Generalized Cross Validation (GCV) [46], and Extrapolation [8]. The discrepancy principle selects the smallest index k such that

2 2 2 kAxk − bk2 ≤ (1.3 ν kbk2) + φ .

30 The first entry of columns 2 to 5 of Table 2 reports, for each method considered and with L defined by (1.5), the percentage of numerical experiments that lead to a regularized solution xk such that

kxk − xexactk2 > ρ Ebest (2.12) for ρ = 2, while the second entry (in parentheses) displays the same quantity in the case ρ = 5.

Table 3: Percentage of numerical experiments that lead to a regularized solution xk such that (2.12) holds for ρ = 10 (ρ = 100), for TSVD with L = L1 given by (1.5) and different values of φ in (2.11). square rectangular systems

method systems φ = 0 φ = 1 φ = 10

COSE 1%( 0%) 1%( 0%) 2%( 0%) 2%( 0%)

L-corner [47] 16%(12%) 20%(15%) 45%(25%) 49%(35%)

Res L-curve [73] 23%( 7%) 29%(17%) 53%(30%) 58%(39%)

Regi´nska [70] 20%( 3%) 6%( 0%) 24%( 1%) 16%( 0%)

ResReg [71] 0%( 0%) 2%( 0%) 24%( 1%) 16%( 0%)

Quasiopt [46] 16%( 2%) 5%( 0%) 5%( 0%) 4%( 0%)

GCV [46] 39%(35%) 12%( 6%) 10%( 1%) 12%( 0%)

Extrapolation [8] 5%( 1%) 4%( 0%) 24%( 1%) 16%( 0%)

Discrepancy [31] 0%( 0%) 1%( 0%) 37%(36%) 47%(45%)

Table 3 reports the same results for the factors ρ = 10 and ρ = 100. Both tables show that the

COSE approach is extremely effective in approximating the TGSVD regularization parameter. In particular, it is the only method, among the tested ones, to produce trustworthy estimates both for consistent and inconsistent problems. From this point of view, only the quasi-optimality criterion gives comparable results.

Similar remarks can be deduced from Tables 4 and 5, which reproduce analogous measurements of Tables 2 and 3 with the regularization matrix L2 (1.6). We conclude that the performance of the COSE method with L 6= I is similar to that reported in [51] for Tikhonov regularization problems in standard form, that is, with L = I.

We turn to numerical experiments that illustrate the behavior of the COSE method when applied to large-scale problems, i.e., we discuss the performance of Algorithms 4 and 5. The first

31 Table 4: Percentage of numerical experiments that lead to a regularized solution xk such that (2.12) holds for ρ = 2 (ρ = 5), for TSVD with L = L2 given by (1.6) and different values of φ in (2.11). square rectangular systems

method systems φ = 0 φ = 1 φ = 10

COSE 21%( 4%) 18%( 5%) 18%( 5%) 17%( 5%)

L-corner [47] 63%(46%) 65%(51%) 78%(66%) 82%(70%)

Res L-curve [73] 60%(44%) 75%(57%) 83%(72%) 82%(69%)

Regi´nska [70] 31%(15%) 22%( 8%) 32%(17%) 24%(11%)

ResReg [71] 23%( 6%) 19%( 1%) 32%(17%) 24%(11%)

Quasiopt [46] 34%(18%) 23%( 9%) 21%( 8%) 17%( 6%)

GCV [46] 63%(59%) 22%(16%) 33%(18%) 24%(11%)

Extrapolation [8] 42%(19%) 33%( 8%) 33%(17%) 24%(11%)

Discrepancy [31] 22%( 3%) 24%( 1%) 51%(39%) 55%(48%) algorithm constructs a linear space of small dimension by the Golub–Kahan process, where the original problem is projected before applying Algorithm 3. No information about the null space of the regularization matrix L is required. Algorithm 5, on the other hand, requires a user to provide a basis of this null space. The availability of this basis makes it possible to decompose the given problem into a large-scale problem, whose solution is orthogonal to N (L) and which is solved by

Algorithm 4, and a small problem which furnishes the component of the solution in the null space.

Figure 3 is concerned with the solution of the test problem Phillips from [46] of size 500×500,

−2 with noise level ν = 10 and L = L2. The plot on the left shows the Euclidean error norm produced by Algorithms 4 and 5 when k increases. The optimal values for the two methods are

opt opt k4 = 7 and k5 = 5, respectively. The graphs on the right display the behavior of the quantity δk

(2.3), which is minimized by k4 = 8 and k5 = 6. The errors obtained with these parameter values are very close to the optimal error, and the approximate solutions determined by Algorithms 4 and 5 are close to the model solution, as the graphs on the left of Figure 5 shows.

Figure 4 illustrates a case when Algorithm 4 fails. The test problem is Shaw from [46]; the noise level and regularization matrix are the same as above. In this case, the trend of the δk is quite oscillatory and a false minimum at k4 = 4 produces an over-regularized solution; see the graph on

32 Table 5: Percentage of numerical experiments that lead to a regularized solution xk such that (2.12) holds for ρ = 10 (ρ = 100), for TSVD with L = L2 given by (1.6) and different values of φ in (2.11). square rectangular systems

method systems φ = 0 φ = 1 φ = 10

COSE 1%( 0%) 3%( 0%) 3%( 0%) 3%( 0%)

L-corner [47] 40%(32%) 48%(36%) 60%(42%) 61%(45%)

Res L-curve [73] 39%(23%) 52%(32%) 61%(39%) 59%(40%)

Regi´nska [70] 4%( 0%) 4%( 0%) 7%( 0%) 0%( 0%)

ResReg [71] 1%( 0%) 0%( 0%) 7%( 0%) 0%( 0%)

Quasiopt [46] 10%( 0%) 4%( 0%) 3%( 0%) 0%( 0%)

GCV [46] 57%(54%) 13%( 5%) 8%( 0%) 0%( 0%)

Extrapolation [8] 5%( 2%) 1%( 0%) 7%( 0%) 1%( 0%)

Discrepancy [31] 0%( 0%) 0%( 0%) 38%(36%) 46%(45%) the right of Figure 5. On the contrary, Algorithm 5 returns the optimal solution.

Algorithm 4 produces an incorrect solution also for the problem Deriv2 from [46]; see Fig- ure 6. Here, there is a different problem: the projected Krylov space does not contain a suitable approximation for the solution, as is testified by the slowly decaying error curve, whose minimum is well approximated by the algorithm. The resulting solution is under-regularized, as the left plot of Figure 8 shows, while Algorithm 5 gives an accurate approximation.

In the above example the model solution is only approximately in N (L2). In Figure 7 we analyze the case of a nontrivial solution which is exactly contained in the null space. We consider the model solution xexact with components

4π(i − 1) x = sin , i = 1, . . . , n, i n

16π2 and choose L = n2 I + L2. Both Algorithms 4 and 5 yield accurate approximations of xexact for this problem, and the computed solutions are graphically indistinguishable from xexact, as the right plot in Figure 8 shows.

When A in (1.1) is obtained by discretizing a Fredholm integral equation in a uniform manner

33 on a square, such as in image restoration, the regularization matrix   I ⊗ L  n 1 (2n2−2n)×n2 L =   ∈ R , (2.13) L1 ⊗ In is commonly used; see, e.g., [19, 54]. Here In denotes the identity matrix of order n and L1 is a matrix (1.5) with (n − 1) × n, and ⊗ stands for the Kronecker product. Many other regularization matrices have been proposed in the literature; see, e.g., [7, 24, 53, 63, 74].

To conclude, we consider the regularization matrix (2.13), and apply it to the solution of an image restoration problem, namely, the test problem Tomo from [46]. We fix the input parameter N to 32; this generates a linear system with m = n = 1024. The noise level is ν = 10−2.

The graphs for the errors and δk-values are shown in Figure 9. Due to the very slow decay of the singular values of the coefficient matrix, we fixed the maximum number of iteration to 400. Both

Algorithm 4 and Algorithm 5 are able to correctly estimate the optimal regularization parameter within this range. This is confirmed by the plots of the solutions, reported in Figure 10.

34 k=1, =7.19e-01 k=3, =2.71e-01 k=5, =1.08e-01 1 3 5

1 1 1

0.5 0.5 0.5

0 0 0 10 20 30 40 10 20 30 40 10 20 30 40

k=7, =2.06e-01 k=9, =4.24e-01 k=11, =3.42e+00 7 9 11

1 1 1

0.5 0.5 0.5

0 0 0 10 20 30 40 10 20 30 40 10 20 30 40

Figure 1: Test problem Gravity, from [46], with m = n = 40, L = L1 given by (1.5), and ν = 10−2. The thick graphs represent the exact solution, the thin graphs show TGSVD solutions xk for k = 1, 3,..., 11, and the dashed graphs are the corresponding Tikhonov solutions xλk . Each plot reports the value of δk (2.3). The minimal δk, as well as the best approximation of xexact in the Euclidean norm, are achieved for k = 5.

35 10 4 Error

k 10 3

10 2

10 1

10 0

10 -1 5 10 15 20

Figure 2: The thick dashed graph represents the error (2.10) for the numerical experiment reported in Figure 1, and the thin graph represents the values of δk (2.3), for k = 1, 2,..., 20. The minima of both graphs are attained for k = 5.

−2 opt Figure 3: Test problem Phillips, from [46], with m = n = 500, ν = 10 , L = L2, k4 = 7, opt k5 = 5, k4 = 8, k5 = 6. The errors (2.10) are plotted on the left and the δk on the right for k = 1, 2,... .

36 −2 opt opt Figure 4: Test problem Shaw from [46] with m = n = 500, ν = 10 , L = L2, k4 = 6, k5 = 4, k4 = 4, k5 = 4. The errors (2.10) are displayed on the left and the δk on the right for k = 1, 2,... .

Figure 5: Exact and computed approximate solutions for the numerical examples of Figures 3 (left) and 4 (right).

37 −2 opt Figure 6: Test problem Deriv2 from [46] with m = n = 500, ν = 10 , L = L2, k4 = 15, opt k5 = 3, k4 = 13, k5 = 2. The errors (2.10) are shown on the left and the δk on the right for k = 1, 2,... .

Figure 7: Test problem Deriv2, from [46], with the “sin” solution, m = n = 500, ν = 10−2, opt opt L = L2, k4 = 7, k5 = 5, k4 = 1, k5 = 7. The errors (2.10) are depicted on the left and the δk on the right for k = 1, 2,... .

38 Figure 8: Exact and computed approximate solutions for the numerical examples of Figures 6 and 7.

Figure 9: Test problem Tomo, from [46], with m = n = 1024, ν = 10−2, L defined by (2.13), opt opt k4 = 386, k5 = 386, k4 = 360, k5 = 360. The errors (2.10) are shown on the left and the δk on the right for k = 1, 2,... .

39 Figure 10: Exact and computed approximate solutions for the numerical example tomo.

40 CHAPTER 3

Comparison of A-Posteriori Parameter Choice Rules for Linear Discrete Ill-Posed

Problems

Recall the discrepancy principle equation (1.18)

2 2 2 kAxλ − bk2 = τ kek2.

A modification of the discrepancy principle, proposed independently by Gfrerer and Raus, can be used to determine λ as well. Analysis of this modification in a infinite-dimensional Hilbert space setting suggests that it will determine a value of λ that yields an approximate solution of higher quality than the approximate solution obtained when using the (standard) discrepancy principle to compute λ. This chapter compares these a-posteriori rules for determining λ when applied to the solution of many linear discrete ill-posed problems with different amounts of error in the data.

The discrepancy principle (1.18) is a non-linear equation for λ as a function of (= ||e||2) > 0. It has a unique solution λ = λ() for most reasonable values of . A proof in an infinite-dimensional

Hilbert space setting that

xλ() → xexact

T T −1 T as  & 0 can be found, e.g., in [31], where xλ() = (AA + λ()L L) A b. We remark that the quality of the computed solution xλ() is sensitive to the accuracy of  defined by (1.3): when   kek, the regularization parameter λ determined by (1.18) is unnecessarily large, and   kek results in a too small value of λ. The sensitivity of λ and xλ to inaccuracies in an available estimate of kek has been investigated by H¨amariket al. [41], who proposed alternatives to the discrepancy principle when only a poor estimate of kek is known. In the present chapter, we will assume that a fairly accurate estimate of kek is available. Such an estimate may be known for the problem at hand or can be determined by a denoising method; see, e.g., [5, 9, 49] and references therein for a variety of such methods. The difference between the original and denoised signals can be used as an estimate of the noise in the original signal. This is illustrated in [50].

41 Introduce the function

2 T T −2 φ2(λ) := λ b (AA + λI) b. (3.1)

Equation (1.18) for λ can be expressed as

2 2 φ2(λ) = τ  ; (3.2) see, e.g., [20,43] for details. The function φ2 is monotonically increasing with λ. It may be beneficial to replace λ by ν = 1/λ before solving the equation (3.2) by Newton’s method; see [20, 43] for discussions. We will denote the solution of (3.2) by λ2 and the associated approximation of xexact determined by Tikhonov regularization by xλ2 . In the following variation of the discrepancy principle, which is referred to as the modified discrepancy principle by H¨amariket al. [41], the function (3.1) is replaced by

3 T T −3 φ3(λ) := λ b (AA + λI) b (3.3) and equation (3.2) is replaced by

2 2 φ3(λ) = τ  . (3.4)

We denote the solution of (3.3) by λ3. This approach to determine the regularization parameter was first proposed by Gfrerer [37] and Raus [69]. Analysis in an infinite-dimensional Hilbert space setting by Gfrerer [37] suggests that the solution xλ3 should be be a more accurate approximation of xexact than xλ2 . Discussions of the modified discrepancy principle can be found in, e.g., Engl et al. [31, Section 5.1], Hanke and Hansen [43], Hansen [44, Section 7.3], and Neubauer [61].

The function φ3 is monotonically increasing with λ. Therefore (3.4) has a unique solution for reasonable values of . We may compute it, e.g., by Newton’s method.

2 2 Proposition 1. Let λj be the unique solution of φj(λ) = τ  for j ∈ {2, 3}. Then λ3 ≥ λ2. Generally, the inequality is strict.

The proposition shows that the modified discrepancy principle typically regularizes more than the (standard) discrepancy principle. We present a proof at the end of the following section after having introduced the singular value decomposition (SVD) of the matrix A.

It is the purpose of the present chapter to compare the quality of the solutions xλ2 and xλ3 when solving linear discrete ill-posed problems by Tikhonov regularization.

42 This chapter is organized as follows. Section 3.1 defines the singular value decomposition

(SVD) of A. Substitution of this decomposition into (3.2) and (3.4) makes the evaluation of these functions easy and fast for each value of λ. However, the computation of the SVD of a large matrix is expensive. We therefore discuss in Section 3.2 how to, instead of the SVD of A, use a small matrix that is computed by carrying out a few steps of Golub–Kahan bidiagonalization applied to

A. Section 3.3 contains computed examples.

3.1 The singular value decomposition

m×n Introduce the SVD of the matrix A ∈ R . For notational simplicity, we assume that m ≥ n, but this restriction easily can be removed. Thus,

A = UΣV T , (3.5) where the matrices

m×m n×n U = [u1, u2,..., um] ∈ R and V = [v1, v2,..., vn] ∈ R have orthonormal columns uj and vj, respectively, and

m×n Σ = diag[σ1, σ2, . . . , σn] ∈ R .

The σj are referred to as singular values and satisfy σ1 ≥ σ2 ≥ ... ≥ σn ≥ 0. We refer to [39] for details and properties of the SVD.

Let

T T be = [eb1,eb2,...,ebm] := U b.

Substituting the SVD (3.5) into (3.1) and (3.3) gives, for λ > 0, n b2 m p T ˆ ˆ T −p X ej X 2 φp(λ) = λ be (ΣΣ + λI) be = + eb (3.6) (ˆσ2/λ + 1)p j j=1 j j=n+1 for p ∈ {2, 3}. The right-hand side of (3.6) can easily and inexpensively be evaluated for many different values of λ > 0. This makes fast solution of (3.2) or (3.4) possible.

Having solved (3.2) or (3.4) for λ = λ2 or λ = λ3, respectively, we compute the associated solutions xλ2 or xλ3 of (1.4) with L = I by substituting (3.5) into (1.10) with L = I and letting

λ = λ2 or λ = λ3 in n X σˆjebj x = v λ σˆ2 + λ j j=1 j

43 Proof of Proposition 1. The function

2 ebj p → 2 p (ˆσj /λ + 1) is increasing. Therefore, for fixed λ > 0, φ3(λ) ≥ φ2(λ). The inequality is strict if ebjσˆj 6= 0. The function 2 ebj λ → 2 p (ˆσj /λ + 1) also is increasing. In order for φ2(λ2) = φ3(λ3), we must have λ3 ≥ λ2. The inequality is strict if ebjσˆj 6= 0 for at least one index 1 ≤ j ≤ n. 

3.2 Bidiagonalization and quadrature

This section outlines an approach to solve large-scale Tikhonov regularization problems (1.4) with

L = I. Details of this approach are described in [20]. It uses the connection between partial

Golub–Kahan bidiagonalization of the matrix A and certain Gauss-type quadrature rules that can be used to bound quantities of interest when determining the regularization parameter λ.

3.2.1 Bidiagonalization

The range of U` is the Krylov subspace

T T T T T T (`−1) T K`(A A, A b) = span{A b, (A A)A b,..., (A A) A b}. (3.7)

We seek to compute an approximate solution xλ,` = U`yλ,` of (1.4) in this subspace. Applying a Galerkin method to the normal equations associated with (1.4) with L = I yields

T T T T U` (A A + λI)U`yλ,` = U` A b, which, by using the decompositions (1.21), can be expressed as

T T (B`+1,`B`+1,`+λI)yλ,` = B`+1,`e1kbk2, (3.8)

T where e1 = [1, 0,..., 0] denotes the first axis vector. The solution of (3.8) by rewriting the equations as an equivalent least-squares problem is described in [20].

It remains to determine how many bidiagonalization steps, `, to carry out. The computed solution xλ,` cannot satisfy the discrepancy or modified discrepancy principles when ` is too small,

44 while letting ` be large may make the application of Golub–Kahan bidiagonalization unnecessarily expensive. We will determine upper and lower bounds for the functions (3.1) and (3.3) with the aid of quadrature rules that can be evaluated by using the connection between Gauss-type quadrature and the decompositions (1.21). This approach for computing upper and lower bounds for the function (3.1) has previously been described in [20].

3.2.2 Quadrature rules

We review the technique used in [20] for computing bounds for the function (3.1). This method also can be applied to bound the function (3.3). We refer to [20] for details. Extensions and many references can be found in [32, 38].

Consider the spectral factorization

AAT = W˜ ΛW˜ T ,

˜ ˜ ˜ m×m ˜ m×m where Λ = diag[λ1, λ2,..., λm] ∈ R and the matrix W ∈ R is orthogonal. Substitution into (3.1) or (3.3) yields

T −1 −p T φp(λ) = b W˜ (λ Λ + I) W˜ b m 2 X βbj = , (3.9) −1˜ p j=1 (λ λj + 1)

T ˜ T where b = [βb1, βb2,..., βbm] := W b. The sum in (3.9) can be expressed as a Stieltjes integral

Z ∞ 1 φp(λ) = dω(λ˜) (3.10) −1 p 0 (λ λ˜ + 1)

2 with a piece-wise constant distribution function ω with jump discontinuities of height βbj at the eigenvalues λ˜j; dω is the associated measure. We will approximate the integral (3.10) by Gauss-type quadrature rules. One can show that

2 T −1 T −p G`,p(λ) := kbk2e1 (λ B`,`B`,` + I`) e1. is an `-node Gauss quadrature rule for approximating the integral (3.10) and

2 T −1 T −p R`+1,p(λ) := kbk2e1 (λ B`+1,`B`+1,` + I`+1) e1. is an (` + 1)-node Gauss–Radau quadrature rule with a fixed nodes at the origin for approximating the same integral; see, e.g., [20, 32, 38] for details.

45 Since the derivatives of the integrand in (3.10) (as a function of λ) of even order are negative on the interval of integration and the derivatives of odd order are positive, the remainder formulas for the error in Gauss and Gauss–Radau quadrature show that, generically,

G`,p(λ) < φp(λ) < R`+1,p(λ), λ > 0, p ∈ {2, 3}. (3.11)

The quadrature errors of the rules G`,p(λ) and R`+1,p(λ) decrease as ` increases; see [58]. Thus, we can compute upper and lower for the integral (3.10) of desired accuracy by using the decompositions

(1.21) with ` chosen sufficiently large. Following [20], we increase ` until we can determine a value of λ, denoted by λ`, that satisfies

2 2 2  ≤ G`,p(λ`) and R`+1,p(λ`) ≤  α (3.12) for some constant α > 1 independent of λ and `. It follows from (3.11) that

2 2 2  < φp(λ`) <  α .

For many linear discrete ill-posed problems, the required number of bidiagonalization steps, `, is quite small. This is illustrated in the following section. Having determined λ` as described, we compute the corresponding approximate solution xλ`,` in the Krylov subspace (3.7) as outlined in

Section 3.2.1. Here we only note that the dominating computational expense for determining xλ`,` is the evaluation of the decompositions (1.21). Note that it is not necessary to compute an SVD of any matrix.

3.3 Computed examples

This section presents computed examples with several of the linear discrete ill-posed problems that are available in the MATLAB package Regularization Tools by Hansen [46]. All problems are discrete ill-posed problems; many are discretizations of Fredholm integral equations of the first kind. The problems are described in [46]. Further discussions on some of the problems can be found in [1, 35, 68, 76].

2000×2000 The discretized problems have matrices A ∈ R . The codes in the Regularization Tools provide the “exact solution” xexact, which is used to compute the “exact right-hand side” bexact :=

Axexact. The error e in b (cf. (1.3)) is Gaussian with zero mean and the variance chosen to

−1 −2 −3 correspond to a specified noise level kek2/kbexactk2 ∈ {10 , 10 , 10 }. For each problem and

46 each noise level we generate 10 random noise vectors e. The tables report averages of the relative restoration error kx − x k RRE := computed exact 2 (3.13) kxexactk2 achieved for the 10 noise vectors of specified noise level, as well as standard deviations (SD), when using the discrepancy principle and the modified discrepancy principle to determine the regularization parameter. Table 6 shows results for the situation described in Section 3.1 when the SVD of A is computed, while Table 7 displays the corresponding results when the solution is computed by first carrying out a few bidiagonalization steps as described in Section 3.2. The parameters τ and α in (1.18) and (3.12), respectively, are set to 1.01.

Table 6 shows the average relative error to be smaller when the parameter λ is determined by the discrepancy principle than when it is determined by the modified discrepancy principle for all problems and all noise levels. The standard deviation of the relative error is for some problems slightly larger for the discrepancy principle than for the modified discrepancy principle. The table shows results for matrices A of size 2000 × 2000. The computations are carried out by using the

SVD of A as described in Section 3.1. Results similar to those of Table 6 are obtained for smaller and larger matrices A. The table indicates that there is no reason to use the modified discrepancy principle when an accurate estimate of the norm of the error e is available.

Table 7 is analogous to Table 6 and shows results for the situation when the regularization parameter λ is computed by the method of Section 3.2. The table shows the average relative error to be smaller when the parameter λ is determined by the discrepancy principle than when it is determined by the modified discrepancy principle for all problems and all noise levels. For some problems, the standard deviation of the relative error is somewhat larger when the discrepancy principle is used. The solution method reduces the matrix A in the large Tikhonov regularization problem (1.4) with L = I to a small bidiagonal matrix B`+1,`; see (1.21) and (1.20). Table 7 reports for each problem the average value of ` for each noise realization. We remark that the computation of an approximation of xexact in a Krylov subspace of dimension ` entails regularization in addition to the regularization furnished by choosing a regularization parameter λ > 0. Table 7 shows results for matrices A of size 2000 × 2000. The table shows the discrepancy principle to yield a smaller average relative error (3.13) than the modified discrepancy principle. Analogous results are

47 Table 6: 2000 by 2000 : Modified Discrepancy Principle (MD) vs. Discrepancy Principle (D) using the SVD to determine the regularization parameter λ. The table shows the average relative restoration error (avg. RRE) and its standard deviation (SD).

noise problem avg. RE SD

level MD D MD D

10−3 baart 1.1×10−1 1.1×10−1 4.5×10−3 5.3×10−3

foxgood 1.0×10−2 7.5×10−3 2.5×10−3 2.6×10−3

shaw 4.9×10−2 4.6×10−2 1.2×10−3 1.9×10−3

gravity 1.3×10−2 1.0×10−2 2.2×10−3 2.0×10−3

deriv2 3.7×10−1 1.4×10−1 2.3×10−1 5.6×10−3

heat 3.0×10−2 2.3×10−2 2.2×10−3 1.3×10−3

phillips 9.0×10−3 6.3×10−3 9.3×10−4 1.0×10−3

10−2 baart 1.6×10−1 1.5×10−1 1.1×10−2 1.4×10−2

foxgood 2.9×10−2 1.6×10−2 8.8×10−3 7.6×10−3

shaw 7.7×10−2 6.3×10−2 1.7×10−2 1.5×10−2

gravity 2.9×10−2 2.1×10−2 4.9×10−3 5.8×10−3

deriv2 3.4×10−1 2.0×10−1 1.9×10−1 1.3×10−2

heat 8.4×10−2 6.4×10−2 7.9×10−3 6.8×10−3

phillips 2.3×10−2 1.7×10−2 1.7×10−3 3.7×10−3

10−1 baart 2.7×10−1 2.3×10−1 6.2×10−2 4.7×10−2

foxgood 5.5×10−2 3.2×10−2 2.1×10−2 1.5×10−2

shaw 1.8×10−1 1.3×10−1 8.8×10−2 3.3×10−2

gravity 7.1×10−2 5.0×10−2 1.3×10−2 1.4×10−2

deriv2 4.1×10−1 3.1×10−1 1.5×10−1 2.7×10−2

heat 7.1×10−1 1.7×10−1 7.9×10−1 1.7×10−2

phillips 5.9×10−2 4.1×10−2 8.5×10−3 8.2×10−3

48 Table 7: 2000 by 2000 : Modified Discrepancy Principle (MD) vs. Discrepancy Principle (D) using bidiagonalization to determine the regularization parameter λ. The table shows the average number of bidiagonalization steps (avg. `), the average relative error (avg. RE), and its standard deviation (SD).

noise problem avg. ` avg. RE SD

level MD D MD D MD D

10−3 baart 4.9 5.0 1.2×10−1 1.0×10−1 8.8×10−3 3.4×10−2

foxgood 4 4 1.1×10−2 8.0×10−3 2.6×10−3 2.3×10−3

shaw 8 8 4.9×10−2 4.2×10−2 1.1×10−3 1.4×10−3

gravity 9 9.1 1.6×10−2 1.3×10−2 1.1×10−3 1.6×10−3

deriv2 14.8 15.2 1.5×10−1 1.4×10−1 4.5×10−3 4.4×10−3

heat 21 22 3.3×10−2 2.3×10−2 1.6×10−3 1.5×10−3

phillips 9.8 10.6 1.1×10−2 6.8×10−3 9.8×10−4 1.5×10−3

10−2 baart 4 4 1.6×10−1 1.5×10−1 3.4×10−3 7.1×10−3

foxgood 3 3 2.9×10−2 1.9×10−2 5.2×10−3 7.1×10−3

shaw 6 6 1.0×10−1 9.3×10−2 8.7×10−3 1.0×10−2

gravity 7 7 3.5×10−2 2.7×10−2 1.9×10−3 2.4×10−3

deriv2 8 8.1 2.4×10−1 2.2×10−1 5.8×10−3 9.2×10−3

heat 14 15 9.9×10−2 7.2×10−2 3.4×10−3 4.4×10−3

phillips 7.2 7.5 2.8×10−2 2.2×10−2 2.4×10−3 2.3×10−3

10−1 baart 3 3 3.0×10−1 2.7×10−1 2.0×10−2 2.3×10−2

foxgood 3 3 7.9×10−2 3.9×10−2 1.2×10−2 1.3×10−2

shaw 5 5 1.6×10−1 1.4×10−1 1.1×10−2 2.1×10−2

gravity 5 5 8.3×10−2 6.0×10−2 7.7×10−3 7.0×10−3

deriv2 4 4 3.7×10−1 3.5×10−1 7.8×10−3 8.2×10−3

heat 9 9 2.5×10−1 2.1×10−1 1.1×10−2 1.1×10−2

phillips 4.9 6.4 9.0×10−2 4.3×10−2 1.0×10−2 6.9×10−3

49 obtained for Tikhonov regularization problems with (1.4) with L = I with a matrix A of different size.

50 CHAPTER 4

Numerical aspects of the Nonstationary Modified Linearized Bregman algorithm

Among the many methods described in the literature, the Bregman algorithm has attracted a great deal attention and been widely investigated. Recently, a nonstationary preconditioned version of this algorithm, referred to as the nonstationary modified linearizedan iterative methods

Bregman algorithm, was proposed. The aim of this chapter is to discuss numerical aspects of this algorithm and to compare computed results with known theoretical properties. We also discuss the effect of several parameters required by the algorithm on the computed solution.

4.1 Landweber iteration

This section seeks to shed light on how the parameter δ > 0 in (1.28) and (1.32) affects the rate of convergence of the iterates. To simplify the analysis, we set µ = 0. Then the soft-thresholding operator Sµ becomes the identity operator, and the iterations (1.28) and (1.32) turn into Landweber iteration

sk+1 = sk + δAT (b − Ask), k = 0, 1,..., (4.1) and preconditioned Landweber iteration

sk+1 = sk + δAT (AAT + αI)−1(b − Ask), k = 0, 1,..., (4.2) respectively. The parameter α is assumed to be positive. Analyses of these iterations related to our analysis below can be found in, e.g., Elfving et al. [29] and Engl et al. [31].

m×m Let the matrix M ∈ R be symmetric. Then its eigenvalues are real and we may choose the eigenvectors to be orthogonal. We will refer to the eigenvectors associated with the largest eigenvalues as the largest eigenvectors.

Proposition 2. Let s0 = 0 and assume that 0 < δ < 2/ρ(AT A). Then the iterates (4.1) converge

∗ to the solution s of minimal `2-norm of the least-squares problem

min kAs − bk2. (4.3) s∈Rn

51 n T k ∗ Let {uj}j=1 denote the set of orthonormal eigenvectors of A A and express the difference s − s in terms of these eigenvectors,

n k ∗ X k k s − s = γj uj, γj ∈ R. (4.4) j=1

T k Then the choice δ = 1/ρ(A A) makes nonvanishing coefficients γj associated with the largest k eigenvectors uj converge to zero faster as k increases than nonvanishing coefficients γj associated with other eigenvectors.

Proof. The convergence of the iterates (4.1) when 0 < δ < 2/ρ(AT A) is well known. It follows by substituting the spectral factorization of AT A into (4.1). The fact that all iterates live in the range of AT makes them orthogonal to the null space of A. Therefore, they converge to the solution of

k minimal Euclidean norm; see, e.g., [29,31] for details. The rate of convergence of the coefficients γj as k increases follows by studying how the components in the right-hand side of (4.4) are damped during the iterations.

Proposition 3. Let s0 = 0, α > 0, and assume that 0 < δ < 2(1 + α/ρ(AT A)). Then the iterates

∗ (4.2) converge to the solution s of the minimization problem (4.3) of minimal `2-norm. Consider the differences (4.4) with the iterates sk defined by (4.2). Then the choice

δ = 1 + α/ρ(AT A) (4.5)

k makes nonvanishing coefficients γj in (4.4) associated with the largest eigenvectors uj converge to k zero faster as k increases than nonvanishing coefficients γj associated with other eigenvectors.

Proof. By (4.2), the iterates sk, k = 1, 2,... , live in the range of AT . Therefore, if they converge, then they converge to the solution of (4.3) of minimal Euclidean norm. The convergence of the sequence sk, k = 1, 2,... , can be established similarly as in the proof of Proposition 2, i.e., by

k ∗ investigating how the error ek = s − s is damped during the iterations. We have

ek+1 = ek − δAT (AAT + αI)−1Aek. (4.6)

Using the identity

AT (AAT + αI)−1A = (AT A + αI)−1AT A

52 and the spectral factorization

T T A A = W˜ ΛW˜ , Λ = diag[λ1, λ2, . . . , λn], W˜ = [w˜ 1, w˜ 2,..., w˜ n], we obtain from (4.6) that

k+1 −1 k j T j ej = (I − δ(Λ + αI) Λ)e , e := U e . (4.7)

The observation that t → t/(α + t) is an increasing function of t ≥ 0 shows convergence of the errors ek to zero as k increases when 0 < δ < 2(1 + α/ρ(AT A)). The rate of convergence of the

k coefficients γj in the expansion (4.6) to zero as k increases follows by studying the components of errors in (4.7).

We remark that it is easy to show that the vector s∗ of Proposition 3 also is the solution of minimal Euclidean norm of

min kAs − bk(AAT +αI)−1 , s∈Rn which is the expression in (1.34).

We are interested in damping the largest eigenvectors in the difference sk − s∗ of Proposition

3, because these eigenvectors are the most important components of xexact; the smallest eigenvec- tors model noise and generally should not be included in the computed approximation of xexact. Proposition 3 suggests that when the parameter α is not “tiny” and an estimate of ρ(AT A) that is not “huge” is available, a value of δ based on an estimate of the right-hand side of (4.5) should be used, because this may result in faster convergence of the iterates than δ = 1.

Thus, Proposition 3 indicates that δ should be chosen larger than unity for the iterations (4.2) to achieve a higher rate of convergence. While the proposition does not apply to the iterates (1.33) with µ > 0, we, nevertheless, would expect the latter iterates to converge faster for δ > 1 than for

δ = 1, at least for some problems and when µ is not too large. Computed examples reported in the following section illustrate that this indeed is the case.

4.2 Numerical aspects of the NMLB algorithm

This section discusses the performance of the NMLB algorithm when applied to the solution of a few linear discrete ill-posed problems from Regularization Tools by Hansen [46]. In particular, we are interested in studying the influence of user-specified parameters on the computed solution.

53 Following Huang et al. [52], we chose the sequence of parameters

k −15 αk = α0q + 10 (4.8)

−15 for the preconditioners P in (1.33), where α0 > 0 and 0 < q < 1. Thus, αk → α¯ = 10 as k → ∞.

We set α0 = 0.5 in all experiments. This leaves us with the determination of the parameters µ, q, and δ.

We use the discrepancy principle as a stopping criterion with τ = 1.01 in (1.25) and (1.37).

The maximum number of allowed iterations is set to 7000. We will investigate the number of iterations required to satisfy the discrepancy principle as a function of µ, q, and δ. Also the relative restoration error (RRE), defined by

kx − x k RRE(x) = exact 2 , kxexactk2 is studied as a function of these parameters. Finally, we will consider the norm of the residual at

k∗ ∗ the final iteration, i.e., Ax − b 2, where k denotes the number of iterations carried out by the NMLB algorithm.

We use the same tight frame system as Huang et al. [52], i.e., the system of linear B-splines.

n×n n×n This system is formed by a low-pass filter W0 ∈ R and two high-pass filters W1 ∈ R and n×n W2 ∈ R , whose corresponding masks are √ 1 2 1 w(0) = (1, 2, 1) , w(1) = (1, 0, −1) , w(2) = (−1, 2, −1) . 4 4 4

The analysis operator W is derived from these masks and by imposing reflexive boundary conditions.

These boundary conditions are such that W T W = I. We obtain     3 1 0 ... 0 −1 1 0 ... 0          1 2 1   −1 0 1    √   1  . . .  2  . . .  W0 =  ......  ,W1 =  ......  , 4   4            1 2 1   −1 0 1      0 ... 0 1 3 0 ... 0 −1 1

54 and   1 −1 0 ... 0      −1 2 −1    1  . . .  W2 =  ......  . 4        −1 2 −1    0 ... 0 −1 1 Thus,   W0     W =  W  .  1    W2

The matrix W is very sparse. Therefore, the evaluation of matrix-vector products with W and W T is inexpensive.

Since we would like to investigate the performance of the NMLB algorithm for many different choices of µ, q, and δ, we choose the dimensions m and n in all examples to be fairly small; specifically, we set n = m = 200. The error e in b is modeled by white Gaussian noise and we refer to the ratio kek σ = 2 kAxexactk2 as noise level. We use the test problems baart, phillips, and heat from Regularization Tools [46].

They are discretizations of Fredholm integral equations of the first kind. In all examples, the desired solution xexact has a sparse representation in terms of the framelet system of generator used. All computations are carried out using MATLAB 8.6 (R2015a) on a laptop computer with an

Intel(R) Core(TM) i5-3337U CPU @ 1.80 GHz and 16 GB of memory. The floating-point precision is 2 · 10−16. Following Huang et al. [52], we let δ = 1 in subsections 4.2.1-4.2.3. The influence of the value of δ on the convergence rate is illustrated in subsection 4.2.4.

4.2.1 The number of iterations

We first discuss the number of iterations required by the NMLB algorithm to reach convergence

(i.e., to satisfy the discrepancy principle). This is of particular importance since, if too many iterations are carried out, then the algorithm becomes unstable. Consider the sequence αk defined by (4.8). Let σmax and σmin denote the largest and smallest singular values, respectively, of A.

55 2 T T −1 Thus, σmax = ρ(A A). Then the condition number of the preconditioner P = (AA + αkI) is given by 2 σmax + αk κ2(P ) = 2 . σmin + αk

We are interested in the situation when A is severely ill-conditioned, i.e., when σmax  σmin.

Assume for the moment that A is scaled so that σmax = 1. Then we obtain

−15 T 1 + αk 1 + 10 15 κ2(AA + αkI) ≈ → −15 ≈ 10 as k → ∞. αk 10

T Thus, if many iteration are carried out, i.e., if k is large, then the matrix AA + αkI becomes very

m ill-conditioned. In this situation, the evaluation of P u for a vector u ∈ R may suffer from severely propagated error stemming from round-off errors that are introduced during the computations.

Figure 11 displays the number of iterations required for the NMLB algorithm to satisfy the discrepancy principle. Visual inspection of the graphs shows the number of iterations to increase with µ. Moreover, we can observe that, the larger q is, the more iterations are needed. The latter

T −1 is to be expected, since for large q-values and modest k the preconditioner (AA +αkI) is a poor approximation of the matrix (AAT )†.

These observations show that µ should not be chosen too large, because a large µ-value may lead to that the NMLB algorithm requires a large number of iterations to satisfy the discrepancy principle. This makes the algorithm expensive and, moreover, unstable. The poor performance of the NMLB algorithm is evident in Figure 11. We can observe that, starting from a certain µ-value, the number of iterations sharply increases as µ increases. We remark that this behavior is less evident for the phillips test problem, because the matrix A of this problem is less ill-conditioned than the matrices of the other problems.

4.2.2 The residual norm

We turn to the behavior of the norm of the residual at the final iteration. This analysis tells us when, in practical applications, the discrepancy principle is able to effectively terminate the iterations. In theory, it follows from Theorem 3 that the discrepancy principle should effectively stop the algorithm after finitely many iterations, independently of the choice of µ and q. However, if the norm of the residual does not decrease fast enough, then the NMLB algorithm fails to converge due to numerical instability.

56 103 103

102

102

101

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(a) (b)

103 103

102 102

101

101

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(c) (d)

103

102

102

101

101

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(e) (f)

103

103

102

102

101

101

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(g) (h)

103

102

101

100

10-5 10-4 10-3 10-2 10-1 100 101 102

(i) Figure 11: Number of iterations required to reach convergence for different choices of µ and q. The different graphs represent the number of iterations versus µ. The yellow graph is for q = 0.6, the red graph for q = 0.8, and the blue graph for q = 0.9. Panels (a)-(c) report results for the baart test problem, panels (d)-(f) for the heat test problem, and panels (g)-(i) for the phillips test problem. The panels (a), (d), and (g) show results for the noise level σ = 10−3, the panels (b), (e), and (h) for σ = 10−2, and the panels (c), (f), and (i) for σ = 10−1.

57 1025 0.22 0.2 0.18

0.16 1020 0.14

0.12

1015 0.1

0.08 1010

0.06

105

0.04

100

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(a) (b)

100

10-2

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(c) (d)

1.5 0.15

0.14 1.4

0.13 1.3 0.12 1.2 0.11

1.1 0.1

0.09 1

0.08 0.9

0.07 0.8

0.06

0.7

0.05 10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(e) (f)

0.022

0.021

10-2

0.02

0.019

0.018

0.017

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(g) (h)

0.22

0.215

0.21

0.205

0.2

0.195

10-5 10-4 10-3 10-2 10-1 100 101 102

(i) Figure 12: Norm of the residual at the final iteration for different choices of µ and q. The different graphs display the norm of the residual at the final iteration versus µ. The yellow graph is for q = 0.6, the red graph for q = 0.8, and the blue graph for q = 0.9. Panels (a)-(c) report results for the baart test problem, panels (d)-(f) for the heat test problem, and panels (g)-(i) for the phillips test problem. The panels (a), (d), and (g) show results for the noise level σ = 10−3, the panels (b), (e), and (h) for σ = 10−2, and the panels (c), (f), and (i) for σ = 10−1.

58 Figure 12 shows the norm of the residual at the last iteration for different values of µ and q. We observe that for small values of µ, the norm of the residual behaves as expected, i.e., it is constant and equal to τε. This implies that the iterations were terminated by the discrepancy principle.

However, if µ is too large, then we can see that, especially for small noise levels, the discrepancy principle is not able to stop the iterations. This is due to the large number of iterations performed and the consequent ill-conditioning of the preconditioner with small αk > 0. Severely propagated round-off errors prevent the NMLB algorithm from terminating.

4.2.3 The relative restoration error

We would like to analyze the behavior of the RRE as a function of the parameters µ and q. Figure 13 shows the RRE obtained for different choices of µ and q. We observe that for small values of µ, the RRE is almost constant. In fact, for µ small, the NMLB algorithm essentially becomes a nonstationary preconditioned Landweber iteration method; see Section 4.1. As µ increases, the

RRE starts to decrease until a minimum is reached. This behavior is particularly evident for the test problem heat. When µ becomes large, the error increases with µ and this increase can be very sharp. This effect is due to the fact that the NMLB algorithm is unstable for large values of

µ. Figure 14 displays magnifications of the graphs of Figure 13 around the value µ that gives the smallest RRE.

4.2.4 The choice of δ

Let s0 = z0 = 0. This subsection illustrates how the iterates s1, s2,... , defined by  k+1 k T T −1 k  z = z + A (AA + αkI) (b − As ), (4.9) k+1 k+1  s = δSµ(z ), for k = 0, 1,... , depend on the choice of δ. Huang et al. [52] let δ = 1. This choice secures convergence. However, the analysis in Section 4.1 suggests that a larger value of δ may give faster convergence. The computations reported in this subsection show that this indeed may be the case, i.e., the iterates (4.9) for δ > 1 display faster convergence than the iterates (1.33). We will illustrate this with a few representative computations. In all computations for this subsection, the noise level is σ = 1 · 10−2.

We first consider the baart test problem. For µ = 6.9 · 10−4 and µ = 4.8 · 10−2, the iterations

(4.9) for δ = 1 and δ = 1.5 are terminated by the discrepancy principle for all q-values reported in

59 106

1030 105

1025 104

1020 103

1015 102

1010 101

105 100

100 10-1

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(a) (b)

108

103 107

106 102

105

101 104

3 10 100

102

10-1

101

100 10-2

10-1 10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101

(c) (d)

5 104 10

4 103 10

103 102

102 101

101 100

100 10-1

10-1 10-2

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(e) (f)

0.018

0.016

0.014 10-1

0.012

0.01

0.008

0.006

0.004

10-2

10-5 10-4 10-3 10-2 10-1 100 101 102 10-5 10-4 10-3 10-2 10-1 100 101 102

(g) (h)

100

10-1

10-5 10-4 10-3 10-2 10-1 100 101 102

(i) Figure 13: RRE obtained for several choices of µ and q. The different graphs display the RRE versus µ. The yellow graph is for q = 0.6, the red graph for q = 0.8, and the blue graph for q = 0.9. Panels (a)-(c) report results for the baart test problem, panels (d)-(f) for the heat test problem, and panels (g)-(i) for the phillips test problem. The panels (a), (d), and (g) show results for the noise level σ = 10−3, the panels (b), (e), and (h) for σ = 10−2, and the panels (c), (f), and (i) for σ = 10−1.

60 104 106

105 103

104

102

103

102 101

101 100

100

10-1 10-1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5

(a) (b)

103 102

101 102

100

101

10-1

100

10-2

10-1 10-3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.035 0.04 0.045 0.05

(c) (d)

×10-3 10 0.05

0.048

9 0.046

0.044 8 0.042

0.04 7

0.038

0.036 6

0.034

0.032 5

0.03

0.03 0.032 0.034 0.036 0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054 0.056 0.058 0.06

(e) (f)

×10-3 12

11 10-1 10

9

8

7

6

5

4

10-2 3

0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08 0.085 0.05 0.1 0.15 0.2 0.25 0.3

(g) (h)

0.22

0.2

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

(i) Figure 14: RRE for different choices of µ and q. The panels are magnifications of the the panels of Figure 13 around the optimal value of µ, i.e., the µ-value that gives the smallest RRE. The different graphs show the RRE versus µ. The yellow graph is for q = 0.6, the red graph for q = 0.8, and the blue graph for q = 0.9. Panels (a)-(c) report results for the baart test problem, panels (d)-(f) for the heat test problem, and panels (g)-(i) for the phillips test problem. The panels (a), (d), and (g) show results for the noise level σ = 10−3, the panels (b), (e), and (h) for σ = 10−2, and the panels (c), (f), and (i) for σ = 10−1. 61 Table 8: Number of iterations for the baart test problem for two values of δ. δ µ q = 0.99 q = 0.95 q = 0.90 q = 0.85 q = 0.80

1.0 6.9 · 10−4 98 44 28 21 17

4.8 · 10−2 93 43 27 20 16

1.5 6.9 · 10−4 74 37 24 18 15

4.8 · 10−2 71 36 24 18 14

Table 9: Number of iterations for the heat test problem for two values of δ. δ µ q = 0.99 q = 0.95 q = 0.90 q = 0.85 q = 0.80

1.0 6.9 · 10−4 14 11 9 8 7

4.8 · 10−2 24 18 14 11 10

1.5 6.9 · 10−4 9 8 7 6 5

4.8 · 10−2 16 13 10 9 8

Table 8; the errors in the computed approximate solutions are essentially independent of q. Table

8 shows the number of iterations required to satisfy the discrepancy principle for several q-values and two δ-values. As expected, the number of iterations decreases as q is decreased for both values of δ. We also see that for fixed µ and q, the number of iterations required is smaller for the larger value of δ. Indeed, we have observed the number of iterations to decrease as δ increases until δ is too large. The value of δ that gives the least number of iterations depends on the problem.

We turn to the heat test problem for the same values of µ, q, and δ, and the same noise level.

The errors in the computed approximate solutions are essentially independent of q also for this test problem. Table 9 shows the number of iterations required to satisfy the discrepancy principle.

Similarly as in Table 8, the number of iterations decreases as q decreases for fixed δ, and decreases as δ increases for fixed q.

Letting δ > 1 does not reduce the number of iterations required to satisfy the discrepancy principle for the phillips test problem. Hence, the choice of δ that results in the least number of iterations is problem dependent. We conclude that δ = 1 is a safe choice, but the number of iterations may be reduced by choosing a larger δ-value for some problems.

62 4.2.5 Final considerations

We observed the value of the parameter µ to be important for the performance of the NMLB algorithm. This parameter affects both the rate of convergence and the quality of the computed approximate solution. In particular, if µ is chosen too large, then the NMLB algorithm slows down to the point of becoming unstable. Moreover, if the value of µ is far from the value that results in the smallest RRE, the NMLB algorithm may determine an approximation of xexact of poor quality. The value of the parameter q has a lesser effect on the quality of the computed approximation of xexact. However, this parameter affects the rate of convergence. Using a large q-value is both advantageous and disadvantageous. The main advantage is that the NMLB algorithm is more stable, since the convergence towardsα ¯ of the sequence αk is slower. Therefore, more iterations can be carried out before the method becomes unstable due to severe ill-conditioning of the preconditioner P . On the other hand, the number of iterations required to satisfy the discrepancy principle increases with q.

This increase may be rapid. We note that, if a large µ-value is required, then we have to choose a large value of q. This is due to the fact that when µ is large, convergence typically is slow and many iterations are required. The latter leads to instability if q is not large enough. Finally, the examples of the previous subsection illustrate that letting δ be strictly larger than one may increase the rate of convergence.

63 CHAPTER 5

Conclusion

For Chapter 2, it is well known that the use of a regularization matrix L 6= I in (1.4) can yield better approximations of the desired solution xexact than L = I. However, while there are many heuristic techniques available for determining a suitable value of the regularization parameter when

L = I, much less attention has been paid to the development of heuristic methods for L 6= I.

p×n This paper describes three algorithms that can be applied for general matrices L ∈ R . One of these algorithms is well suited for problems of small to moderate size and two algorithms are suitable for use with large-scale problems. The latter algorithms differ in how the null space of L is handled; one of them requires the null space to be explicitly known. Computed examples illustrate the performance of the algorithms described and show them to be competitive.

For Chapter 3, the average relative errors reported for the discrepancy principle are smaller than those achieved with the modified discrepancy principle for a variety of linear discrete ill-posed problems solved by Tikhonov regularization. This holds both when the solution is computed by a method that is well suited for small to moderately sized problems based on first evaluating the SVD of the matrix of the problem, and when the the solution is determined by a method that is well suited for large-scale problems based on reducing the given problem to a smaller one by carrying out a few steps of Golub–Kahan bidiagonalization. We conclude that when a fairly accurate estimate of the noise level in the data b is available, the discrepancy principle performs better than the modified discrepancy principle. When no such estimate is available, it may be beneficial to use other methods for determining the regularization parameter such as methods described in [16] or so-called heuristic parameter choice rules; see, e.g., [4, 55, 71] for discussions of the latter.

For Chapter 4, a numerical investigation of the NMBL algorithm is presented. We have elu- cidated how the choice of some user-specified parameters affect the results obtained with this algorithm. In particular, we show that the choice of the regularization parameter µ is of impor-

64 tance, and that an imprudent choice of this parameter may result in computed solutions of poor quality. The iterates are found to converge faster when the parameter δ in (4.9) is chosen larger than one. The exact choice is not critical, but a too large value of δ prevents convergence. Our examples illustrate that if the user-specified parameters in (1.33) are not chosen carefully, then the theoretical results for this method shown in [52] may not hold in finite precision arithmetic.

In our experiments we considered fairly small examples in one space-dimension so that the preconditioner can be applied repeatedly by using a direct factorization in reasonable computational time. However, in many real-world applications the number of space-dimensions is larger than one and, if the matrix A does not have an exploitable structure, application of the preconditioner by a direct factorization method may not be feasible. To reduce this difficulty Cai et al. [13] recently proposed a modification of the NMLB algorithm, in which preconditioners of the form (1.31) are approximated by nearby ones with a structure that allows their fairly inexpensive application also for large-scale problems. The numerical analysis presented in this paper can be extended to the iterative scheme described by Cai et al. [13]. This is a topic of future research.

65 BIBLIOGRAPHY

[1] M. L. Baart, The use of auto-correlation for pseudo-rank determination in noisy ill-conditioned linear least-squares problems, IMA J. Numer. Anal., 2 (1982), pp. 241–247.

[2] J. Baglama and L. Reichel, Decomposition methods for large linear discrete ill-posed problems, J. Comput. Appl. Math., 198 (2007), pp. 332–342.

[3] Z. Bai, The CSD, GSVD, their applications and computation, IMA preprint 958, Institute for Mathematics and its Applications, University of Minnesota, Minneapolis, MN, 1992.

[4] F. Bauer and M. A. Lukas, Comparing parameter choice methods for regularization of ill-posed problem, Math. Comput. Simulation, 81 (2011), pp. 1795–1841.

[5] A. F. Bentbib, A. Bouhamidi, and K. Kreit, A conditional gradient method for primal-dual total variation-based image denoising, Electron. Trans. Numer. Anal., in press.

[6] A.˚ Bj¨orck, Numerical Methods in Matrix Computation, Springer, New York, 2015.

[7] A. Bouhamidi and K. Jbilou, Sylvester Tikhonov-regularization methods in image restoration, J. Comput. Appl. Math., 206 (2007), pp. 86–98.

[8] C. Brezinski, G. Rodriguez, and S. Seatzu, Error estimates for the regularization of least squares problems, Numer. Algorithms, 51 (2009), pp. 61–76.

[9] A. Buades, B. Coll, and J. M. Morel, Image denosing methods. A new nonlocal principle, SIAM Rev., 52 (2010), pp. 113–147.

[10] A. Buccini, Y. Park, L. Reichel, Comparison of a-posteriori parameter choice rules for linear discrete ill-posed problems, Journal of Computational and Applied Mathematics, Available online, 12 February 2019

[11] A. Buccini, Y. Park, L. Reichel, Numerical aspects of the nonstationary modified linearized Bregman algorithm, Applied Mathematics and Computation, Volume 337, 15 November 2018, Pages 386-398

[12] J.-F. Cai, R. H. Chan, Z. Shen, A framelet-based image inpainting algorithm, Applied and Computational Harmonic Analysis 24 (2) (2008) 131-149.

[13] Y. Cai, M. Donatelli, D. Bianchi, T.-Z. Huang, Regularization preconditioners for frame-based image deblurring with reduced boundary artifacts, SIAM Journal on Scientific Computing 38 (1) (2016) B164-B189.

[14] J. Cai, S. Osher, Z. Shen, Convergence of the linearized Bregman iteration for ‘1- norm mini- mization, Mathematics of Computation 78 (2009) 2127-2136.

[15] J.-F. Cai, S. Osher, Z. Shen, Linearized Bregman iterations for compressed sensing, Mathe- matics of Computation 78 (267) (2009) 1515-1536.

66 [16] J.-F. Cai, S. Osher, Z. Shen, Linearized Bregman iterations for frame-based image deblur- ring,SIAM Journal on Imaging Sciences 2 (1) (2009) 226-252.

[17] J.-F. Cai, S. Osher, Z. Shen, Split Bregman methods and frame based image restoration, Mul- tiscale Modeling and Simulation 8 (2) (2009) 337-369.

[18] D. Calvetti, P. C. Hansen, and L. Reichel, L-curve curvature bounds via Lanczos bidiagonal- ization, Electron. Trans. Numer. Anal., 14 (2002), pp. 20–35.

[19] D. Calvetti, B. Lewis, and L. Reichel, A hybrid GMRES and TV-norm based method for image restoration, in Advanced Signal Processing Algorithms, Architectures, and Implementations XII, ed. F. T. Luk, Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE), vol. 4791, The International Society for Optical Engineering, Bellingham, WA, 2002, pp. 192–200.

[20] D. Calvetti and L. Reichel, Tikhonov regularization of large linear problems, BIT, 43 (2003), pp. 263–283.

[21] A.S. Carasso, Determining surface temperatures from interior observations, SIAM J. Appl. Math. 42 (1982), pp. 558–574.

[22] J. L. Castellanos, S. G´omez,and V. Guerra, The triangle method for finding the corner of the L-curve, Appl. Numer. Math., 43 (2002), pp. 359–373.

[23] J. W. Daniel, W. B. Gragg, L. Kaufman, and G. W. Stewart, Reorthogonalization and stable algorithms for updating the Gram–Schmidt QR factorization, Math. Comp., 30 (1976), pp. 772–795.

[24] M. Donatelli, A. Neuman, and L. Reichel, Square regularization matrices for large linear dis- crete ill-posed problems, Numer. Linear Algebra Appl., 19 (2012), pp. 896–913.

[25] D. L. Donoho, Compressed sensing, IEEE Transactions on Information Theory 52 (4) (2006) 1289-1306.

[26] L. Dykes, S. Noschese, and L. Reichel, Circulant preconditioners for discrete ill-posed Toeplitz systems, Numer. Algorithms, 75 (2017), pp. 477-490

[27] L. Dykes, S. Noschese, and L. Reichel, Rescaling the GSVD with application to ill-posed prob- lems, Numer. Algorithms, 68 (2015), pp. 531–545.

[28] L. Dykes and L. Reichel, Simplified GSVD computations for the solution of linear discrete ill-posed problems, J. Comput. Appl. Math., 255 (2013), pp. 15–27.

[29] T. Elfving, T. Nikazad, P. C. Hanse, Semi-convergence and relaxation parameters for a class of SIRT algorithms, Electronic Transactions on Numerical Analysis 37 (2010) 321-336.

[30] H. W. Engl and W. Grever, Using the L-curve for determining optimal regularization param- eters, Numer. Math., 69 (1994), pp. 25-31

[31] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer, Dor- drecht, 1996.

[32] C. Fenu, D. Martin, L. Reichel, and G. Rodriguez, Block Gauss and anti-Gauss quadrature with application to networks, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1655–1684.

67 [33] C. Fenu, L. Reichel, and G. Rodriguez, GCV for Tikhonov regularization via global Golub- Kahan decomposition, Numer. Linear Algebra Appl., 23 (2016), pp. 467–484.

[34] C. Fenu, L. Reichel, G. Rodriguez, and H. Sadok, GCV for Tikhonov regularization by partial SVD, BIT, 57 (2017), pp. 1019–1039.

[35] L. Fox and E. T. Goodwin, The numerical solution of non-singular linear integral equations, Philos. Trans. Royal Soc. London Ser. A: Math. Phys. Eng. Sci., 245:902 (1953), pp. 501–534.

[36] S. Gazzola, P. Novati, and M. R. Russo, On Krylov projection methods and Tikhonov regular- ization, Electron. Trans. Numer. Anal., 44 (2015), pp. 83–123.

[37] H. Gfrerer, An a posteriori aparameter choice for ordinary and iterated Tikhonov regularization of ill-posed leading to optimal convergence rates, Math. Comp., 49 (1987), pp. 507–522.

[38] G. H. Golub and G. Meurant, Matrices, Moments and Quadrature with Applications, Princeton University Press, Princeton, 2010.

[39] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University Press, Baltimore, 2013.

[40] C. W. Groetsch, The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind, Pitman, Boston, 1984.

[41] U. H¨amarik,R. Palm, and T. Raus, A family of rules for parameter choice in Tikhonov regularization of ill-posed problems with inexact noise level, J. Comput. Appl. Math., 236 (2012), pp. 2146–2157.

[42] M. Hanke, C. W. Groetsch, Nonstationary iterated Tikhonov regularization, Journal of Opti- mization Theory and Applications 98 (1) (1998) 37-53.

[43] M. Hanke and P. C. Hansen, Regularization methods for large-scale problems, Surveys Math. Indust., 3 (1993), pp. 253–315.

[44] P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems, SIAM, Philadelphia, 1998.

[45] P. C. Hansen, Regularization, GSVD and truncated GSVD, BIT, 29 (1989), pp. 491–504.

[46] P. C. Hansen, Regularization tools version 4.3 for Matlab 7.3, Numerical Algorithms 46 (2007) 189-194.

[47] P. C. Hansen, T. K. Jensen, and G. Rodriguez, An adaptive pruning algorithm for the discrete L-curve criterion, J. Comput. Appl. Math., 198 (2006), pp. 483–492.

[48] T. A. Hearn and L. Reichel, Application of denoising methods to regularization of ill-posed problems, Numer. Algorithms, 66 (2014), pp. 761–777.

[49] T. Hearn and L. Reichel, Image denoising via residual kurtosis minimization, Numer. Math. Theor. Meth. Appl., 8 (2015), pp. 403–422.

[50] M. E. Hochstenbach and L. Reichel, An iterative method for Tikhonov regularization with a general linear regularization operator, J. Integral Equations Appl., 22 (2010), pp. 463–480.

[51] M. E. Hochstenbach, L. Reichel, and G. Rodriguez, regularization parameter determination for discrete ill-posed problems, J. Comput. Appl. Math., 273 (2015), pp. 132–149.

68 [52] J. Huang, M. Donatelli, R. H. Cha, nonstationary iterated thresholding algorithms for image deblurring, Inverse Problems and Imaging 7 (3) (2013) 717-736.

[53] G. Huang, S. Noschese, and L. Reichel, Regularization matrices determined by matrix nearness problems, Linear Algebra Appl., 502 (2016), pp. 41–57.

[54] M. E. Kilmer, P. C. Hansen, and M. I. Espa˜nol, A projection-based approach to general-form Tikhonov regularization, SIAM J. Sci. Comput., 29 (2007), pp. 315–330.

[55] S. Kindermann, Convergence analysis of minimization-based noise-level-free parameter choice rules for ill-posed problems, Electron. Trans. Numer. Anal., 38 (2011), pp. 233–257.

[56] S. Kindermann, Discretization independent convergence rates for noise level-free parameter choice rules for the regularization of ill-conditioned problems, Electron. Trans. Numer. Anal., 40 (2013), pp. 58–81.

[57] J. Lampe, L. Reichel, and H. Voss, Large-scale Tikhonov regularization via reduction by or- thogonal projection, Linear Algebra Appl., 436 (2012), pp. 2845–2865.

[58] G. L´opez Lagomasino, L. Reichel, and L. Wunderlich, Matrices, moments, and rational quadra- ture, Linear Algebra Appl., 429 (2008), pp. 2540–2554.

[59] S. Morigi, L. Reichel, and F. Sgallari, A truncated projected SVD method for linear discrete ill-posed problems, Numer. Algorithms, 43 (2006), pp. 197–213.

[60] V. A. Morozov, On the solution of functional equations by the method of regularization, Soviet Math. Dokl., 7 (1966), pp. 414–417.

[61] A. Neubauer, An a posteriori parameter choice for Tikhonov regularization in the presence of modeling error, Appl. Numer. Math., 4 (1986), pp. 203–222.

[62] M. K. Ng, Iterative Methods for Toeplitz Systems, Oxford University Press, Oxford, 2004.

[63] S. Noschese, L. Reichel, Inverse problems for regularization matrices, Numer. Algorithms, 60 (2012), pp. 531–544.

[64] S. Noschese, L. Reichel, Generalized circulant Strang-type preconditioners, with Applications 19 (1), 3-17.

[65] S. Osher, Y. Mao, B. Dong, W. Yin, Fast linearized Bregman iteration for compressed sensing and sparse denoising, Communications in Mathematical Sciences 8 (2010) 93-111.

[66] Y. Park, L. Reichel, G. Rodriguez, and X. Yu, Parameter Determination for Tikhonov Regu- larization Problems in General Form, J. Comput. Appl. Math., 343 (2018), pp. 12-25

[67] R. L. Parker, Understanding inverse theory, Ann. Rev. Earth Planet Sci. 5 (1977), 35–64

[68] D. L. Phillips, A technique for the numerical solution of certain integral equations of the first kind, J. ACM, 9 (1962), pp. 84–97.

[69] T. Raus, Residue principle for ill-posed problems, Acta et comment. Univers. Tartuensis, 672 (1984), pp. 16–26. (in Russian)

[70] T. Regi´nska, A regularization parameter in discrete ill-posed problems, SIAM J. Sci. Comput., 17 (1996), pp. 740–749.

69 [71] L. Reichel and G. Rodriguez, Old and new parameter choice rules for discrete ill-posed prob- lems, Numer. Algorithms, 63 (2013), pp. 65–87.

[72] L. Reichel, G. Rodriguez, and S. Seatzu, Error estimates for large-scale ill-posed problems, Numer. Algorithms, 51 (2009), pp. 341–361.

[73] L. Reichel and H. Sadok, A new L-curve for ill-posed problems, J. Comput. Appl. Math., 219 (2008), pp. 493–508.

[74] L. Reichel and Q. Ye, Simple square smoothing regularization operators, Electron. Trans. Nu- mer. Anal., 33 (2009), pp. 63–83.

[75] L. Reichel and X. Yu, Matrix decompositions for Tikhonov regularization, Electron. Trans. Numer. Anal., 43 (2015), pp. 223–243.

[76] C. B. Shaw, Jr., Improvements of the resolution of an instrument by numerical solution of an integral equation, J. Math. Anal. Appl., 37 (1972), pp. 83–112.

[77] A. N. Tikhonov, Solution of Incorrectly Formulated Problems and the Regularization Method, Soviet Mathematics Doklady, Vol. 4, No. 4, 1963, pp. 1035-1038.

70