<<

Chapter 2 Fundamentals

2.1 Backward error analysis

2.1.1 Forward and backward errors

Inevitably, numerical computation involves errors from several sources. The given problem may be derived from an imperfect model to begin with, and the problem data may be collected using imprecise measurement. The inherent sensitivity of the problem itself may magnify a seemingly negligible perturbation into a substantial deviation. During the computational process, the numerical algorithms contribute unavoidable round-off errors into the mix and pass along all those inaccuracies into the computing results. Among all the contributing factors toward the total error in the computational solution, the algorithm designer controls a significant portion of them. All numerical algorithms are not created equal and some are more accurate than the others. The backward error analysis may help identify where the errors come from, and more importantly, help understand the fundamental question: What is a numerical algorithm really computing? The basic tenet of the backward error analysis may be summarized in one sentence:

A stable numerical algorithm calculates the exact solution of a nearby problem.

What does it mean to have a numerical solution 0.33333333 to the division 1/3? The answer: It is the exact solution to the nearby problem 0.99999999/3. The error in the solution is 1 The forward error = 0.33333333 < 0.000000004 − 3 and the nearby problem 0.99999999 /3 does not appear to be too far 1 0.99999999 The backward error = = 0.00000001. 3 − 3 2 13 14 2 Fundamentals

Using the root-finding problem for x2 2x+1 as an example, Fig. 2.1 illustrates the forward and backward errors as well− as the meaning of the numerical solution whose accuracy is substantially less than commonly expectated.

original problem exact roots exact computation - x2 2x + 1 = 0 x = 1, 1 − H 6 H 6 H numerical computation H in singleH precision backward error H forward error 4 < 10 8 H > 10− − HH H H ? ? HHj (x 0.9999)(x 1.0001) = 0 - − − exact computation x = 0.9999, 1.0001 perturbed problem computed roots Fig. 2.1 An illustration of forward and backward error

The numerical solutions x = 0.9999,1.0001 are the exact roots of the polyno- mial (x 0.9999)(x 1.0001) = x2 2x + 1.00000001 − − − with a backward error

1 1 The backward error = 2 2 = 0.00000001, −1 − 1.00000001− ∞ which is as tiny as we can expect. We can argueably conclude that the underlying numerical algorithm is as accurate as it can be. The resulting

The forward error = 1 0.9999 = 0.0001 1 − 1.0001 ∞ = 10,000 [backward error] × reflects a high sensitivity 10,000that is inherentfrom the root-finding problem itself. Backward error analysis is important and often effective in identifying the real culprit of the inaccurate solution. “If the answer is highly sensitive to perturbations, you have probably asked the wrong question [30].” Indeed, the problem: Problem 2.1. Find the two roots of x2 2x + 1. − is highly sensitive. As shown in 1.4.1, however, the reformulated problem § Problem 2.2. Find the double root of the polynomial nearest to x2 2x + 1. − is surprisingly not sensitive at all if we know the root is 2-fold. We shall further eliminate the required prior knowledge “double root” from Problem 2.2. Knowing 8 an error of the magnitude of 10− is inevitable from round-off in single precision 6 hardware, we can set an backward error tolerance of, say 10− : 2.1 Backward error analysis 15

Problem 2.3. Find the numerical roots of the polynomial x2 2x + 1 and multi- 6 − plicities within ε = 10− as defined in Definition 2.4.

The algorithm UVFACTOR implemented in APALAB [37] for this problem gives accurate answer:

>> f = [1 -2 1]; >> [F,err,cond] = uvFactor(f,1e-6,1);

THECONDITIONNUMBER: 0.214071 THEBACKWARDERROR: 0.00e+000 THE ESTIMATED FORWARD ROOT ERROR: 0.00e+000

FACTORS

( x - 1.000000000000000 )ˆ 2

2.1.2 Backward error estimates and measurements

One of the common approach for estimating backward errors is based on the model of floating point arithmetic

f l(x) = x(1 + δ), δ u (2.1) f l(x y) = (x y)(1 + η), |η|≤ u ◦ ◦ | |≤ where f l( ) is the mapping from the exact value to its numerical value using floating point operation, the symbol represents any one of the binary operation +, , and , and u is the machine◦ epsilon, defined as − × ÷ u = min ε . (2.2) f l(1+ε)=1 | | The method in the following simple example is typical in estimating backward er- rors.

Example 2.1 (Backward error of a sum and its improvement). There are at least two ways to calculate a sum ∑k ak. One way is obvious and the other way is clever and more accurate. Consider

a1 + a2 + a3 + a4

= ((a1 + a2)+ a3)+ a4 (2.3)

= (a1 + a2) + (a3 + a4) (2.4)

The sequencial sum (2.3) adds numbersin its given order, and the pairwise sum (2.4) adds number pairwise. The two sums are theoretically equivalent but substantially different in practical numerical computation. The backward error of the seuential sum (2.3) can be estimated as follows. 16 2 Fundamentals

f l ((a + b)+ c)+ d = (((a1 + a2)(1 + ε2)+ a3)(1 + ε3)+ a4)(1 + ε4) ε ε ε ε ε ε = a1(1 + 2)(1 + 3)(1 + 4)+ a2(1 + 2)(1 + 3)(1 + 4) +a3(1 + ε3)(1 + ε4)+ a4(1 + ε4)

= a˜1 + a˜2 + a˜3 + a˜4 where, by setting ε1 = 0,

a˜k = ak (1 + εk)(1 + εk 1) (1 + εn) = ak(1 + µk) + with µk = εk + εk 1 + + εn, k = 1,2,3,4. + As a result, the floating point sequential sum of a,b,c,d istheexactsumofa ˜,b˜,c˜,d˜ with backward error

[a1,a2,a3,a4] [a˜1,a˜2,a˜3,a˜4] 2 3u [a1,a2,a3,a4] 2 + o(u) − ≤ Using the same technique, one can similarly conclude that the floating point pair- wise sum of a1,a2,a3,a4 is theexactsumofa ˆ1,aˆ2,aˆ3,aˆ4 with backward error

[a1,a2,a3,a4] [aˆ1,aˆ2,aˆ3,aˆ4 2 2u [a1,a2,a3,a4] 2 + o(u). − ≤ The difference between the two the backward error bounds widens as the number n of terms increases. It is easy to see that for a vector a Ê , the backward error ∑n ∈ bound for the floating point sequencial sum k=1 ak is

a a˜ 2 (n 1)u + o(u), − ≤ − while the floating point pairwise sum enjoys a much lower bound

a aˆ 2 log n u + o(u) − ≤ ⌈ 2 ⌉ where log2 n is the smallest integer above log2 n. For a very large n, say n = 1,000⌈ ,000,⌉ such a simple modification in operation order reduces the backward 8 bound from 0.01 to 0.000002 in single precision with u = 10− , improving the backward error bound by five digits. Using the alternating harmonic series as an example, we know

6 10 ( 1)k 1 ∑ − − = 0.69314668 ln2 k=1 k ≈ The single precision sequential sum is 0.6931373 with 4 correct digits, while the pairwise sum is 0.6931468 with 3 more correct digits. ⊓⊔ Forward error can not be measured directly unless the exact solution is known. There is an advantage of analyzing backward error: It can often be measured or verified using the computed results without knowing the exact solution. Example 2.2 (Backward error of root-finding). For a given polynomial 2.1 Backward error analysis 17

n n 1 p(x) = p0x + p1x − + + pn 1x + pn − with a0 = 0, the computed roots z1,...,zn are the exact roots of n n 1 p˜ = p0(x z1) (x zn) = p0x + p˜1x − + + p˜n 1 + p˜n. − − − The backward error is thus

2 2 [p1, p2,..., pn] [p˜1, p˜2 ..., p˜n] = p1 p˜1 + pn p˜n . − 2 | − | | − | If only a single root is computed as z with residual p(z )= δ, then z is the exact root of ∗ ∗ ∗ n n 1 pˆ = p0x + p1x − + + pn 1x + pn δ − − with backward error p pˆ 2 = δ. − ⊓⊔ Example 2.3 (Backward error of matrix eigenvalues). Let λ be a computed eigen- value of an n n matrix A. By an inverse iteration [7, 7.6.1,p.362], we can compute an approximate× eigenvector x of unit length and obtain§ a residual

Ax λx = e. − Then, the identities xHx = 1 and Ax λx = e(xHx) lead to − (A exH)x = λx. − As a result, λ is an exact eigenvalue of the perturbed matrix A exH and we have a version of the backward error −

H H A (A ex ) 2 = ex 2 e 2. − − ≤ If the eigenvalue λ is part of a computed Schur decomposition [7, p.313]

AQ = QT + E where T is the (upper-triangular)Schur form of A, Q is a unitary matrix and E is the residual AQ QT. Then we can similarly derive that λ is an exact eigenvalue H − H of A EQ and obtain a similar backward error EQ 2 = E 2. − ⊓⊔

2.1.3 Condition numbers

Condition numbers are mostly defined case by case. As J. H. Wilkinson put it: “We have avoided framing a precise definition of condition numbers so that we 18 2 Fundamentals may use the term freely[33, p.29]”. In general, we would like to have a condition number as the indicator of the sensitivity of the given problem with respect to the backward error. A large problem indicates that the problem is highly sensitive to data perturbations. One of the common model for defining a condition number is to ensure the inequality

forward error ρ [condition number][backward error] (2.5) ≤ with sufficiently small magnitude of backward error, where ρ is a constant of moderate size. Needless to say, we would like to have the smallest condition number satisfying the inequality (2.5). The arguably the most well known condition number is the matrix condition number in the following example. n n Example 2.4 (Matrix condition number). For a matrix A C × , its condition number ∈ A A 1 if A 1 exists κ(A) = 2 − 2 − (2.6) ∞ otherwise is well-known. This condition number can be justified by a rigorous error bound for the solution of a linear system Ax = b as follows. If xˆ is the (exact) solution of the nearby problem (A + ∆A)xˆ = b + ∆b with ∆A p ε A p, ∆b p ε b p and εκ(A) < 1, then [3, p. 33] ≤ ≤

x˜ x 2 κ(A) ∆A 2 ∆b 2 − + . (2.7) x 2 ≤ 1 εκ(A) A 2 b 2 − The matrix condition number is consistent with the model (2.5). ⊓⊔ Remark. In the literature, it is common to define a distinct condition number associated with each type of matrix norm. Such a distinction is rarely necessary since the difference from κ(A) in (2.6) is a constant multiplie of moderate magni- tude. It is convenient and harmless to use the specific 2-norm in (2.6) for defining the matrix condition number. ⊓⊔ Another commonly adopted model for condition number is forward error condition number = limsup . (2.8) [backward error] 0 backward error → The condition number of zero-finding in the following example is based on this model of formulation. Example 2.5 (Condition number of zero-finding). Consider the zero-finding (also known as root-finding) problem

Solve f (x) = 0 for x where f (x) is a differentiable function. Let x be a simple zero of f (x) in ∗ exact sense, namely f (x )= 0. Suppose xε is a numerical zero of f (x) with ∗ 2.1 Backward error analysis 19 a small residual f (xε )= ε. Then xε is an exact zero of the “nearby” function f (x) ε, with backward error ε . Since x is a simple zero of f (x), we have − | | ∗ f ′(x ) = 0. By the Inverse Function Theorem, there is a neighborhood N of 0 and a differentiable∗ function g(t) for t N such that g(0)= x and ∈ ∗ f (g(t)) = t for all t N. ∈ Consequently,

forward error xε x limsup = lim | − ∗| backward error ε 0 ε [backward error] 0 → → | | g(ε) g(0) = lim − ε 0 ε → 1 = g′(0) = | | f ′(x ) | ∗ | which is the common accepted condition number of the zero-finding problem for the function f (x) at the simple zero x . ∗ ⊓⊔ Example 2.6 (Condition number of simple eigenvalue). The condition number of a simple eigenvalue can be established in the following lemma. n n Lemma 2.1 (Condition number of simple eigenvalue). Let matrix A C × with a simple eigenvalue λ. Let v and w be the unit right and left eigenvectors∈ of A, respectively, associated with λ, namely

H H Av = λv, w A = w λ, v 2 = w 2 = 1. n n For any matrix E C with E 2 1, the perturbed matrix A + E hasan ∈ × ≪ eigenvalue λE satisfying

1 2 λ λE E 2 + O( E ), | − | ≤ vHw 2 | | and λ λE 1 limsup | − | = H . (2.9) E 0 E 2 v w 2→ | | Proof. See [31, pp. 95-96]. The condition number of a simple eigenvalue is thus defined by (2.9). ⊓⊔

2.1.4 Condition numbers corelate to distances from singularity

n n κ Example 2.7. For any matrix A C × with condition number (A), there exists a singular value Aˆ such that ∈ 20 2 Fundamentals

A Aˆ 2 A B 2 1 − = min − = . (2.10) A 2 rank (B)

2 2 2 p p˜ 2 = (x x ) µ 2 = µ + x µ = µ 1 + x − − ∗ | | | ∗ | | | | ∗| and thus the root-finding condition number

1 1 1 + x 2 = µ = | ∗| p′(x ) p p˜ 2 ∗ − which is inversely proportional to the distance p p˜ 2 between the given root- finding problem of p and an ill-posed root-finding − problem ofp ˜.

2.2 Singular value decomposition

One of the leading authors in applied , Gilbert Strang, regards the sin- gular value decomposition as “absolutely a high point of ”. SVD, as it is commonly abbreviated, is one of the essential tools in numerical linear algebra, yet it is discussed sparingly in numerical analysis textbooks if it is mentioned at all. 2.2 Singular value decomposition 21 2.2.1 Singularity in theory and practice

The importance of singular value decomposition and the fundamental difference between “pure” linear algebra and numerical linear algebra can be seen clearly from the following example.

Example 2.9. The following matrix may appear harmless:

1 10 1  . .  A = .. .. (2.11) n    .   .. 1     10 1   (n+1) (n+1)   × which is the matrix derived from the polynomial division by x + 10 as a linear system. The matrix An is clearly nonsingular by definition, with determinant det(An)= 1 for all n. For moderately large n, however, An behaves the same way as a singular matrix in practical computation due to a huge condition number. In fact, let n n ( 10)− ( 10)− − . − 0 . xn =   and bn =  .  ( 10) 1 .  − −     1   0          Then xn 2 1 but its image Anxn = bn is almost a zero vector: ≈ n An xn 2 = bn 2 = 10− (2.12) Namely bnxn⊤ An 2 xn = 0. − xn 2 with bnxn⊤ bn 2 n < 10− . x 2 ≤ x n 2 2 n 2 Consequently, the distance from A to a singular matrix is no larger than 10 n, n − which is practically a zero in hardware precision for n = 15. 1 1 From (2.12), we have xn 2 = A bn 2 A 2 bn 2. Consequently n− ≤ n−

1 xn 2 n An− 2 > 10 ≥ bn 2 1 n+1 The condition number An 2 An− 2 > 10 , which is inversely proportional to the distance to singularity. 22 2 Fundamentals

Such a quantitative distance to singularity, rather than the qualitative singular- ity/nonsingularity, serves as a far more precise indicator on the difficulty level in- volving the underlying matrix in practical computation.

2.2.2 Singular Value Decomposition Theorem

Theorem 2.1 (Singular Value Decomposition Theorem). For every rank-r ma-

m n m m n n

C C trix A C × , there exist unitary matrices U × and V × along with ∈ m n ∈ ∈ a diagonal matrix Σ C × such that ∈ A = U Σ V H (2.13) Σ σ σ where the diagonal entries of = diag( 1,..., min m,n ) satisfy { } σ σ σ σ 1 r > 0, r+1 = = min m,n = 0. (2.14) ≥ ≥ { } For proofs, see e.g. [26, p. 62]. σ σ Σ The diagonal entries 1,..., min m,n of in (2.13) are called singular values { } of A. We will also use σ j(A) to denote the j th largest singular value of A for − j = 1,...,n, while σmin(A) denotes the smallest singular value of A. The column vectors u1,...,um of U and v1,...,vn of V are called left and right singular vectors respectively. Using these singular vectors, we can represent the matrix A in its singular value expansion

H H A = σ1u1v + + σrurv . (2.15) 1 r The Singular Value Decomposition Theorem leads to the following corollaries.

Corollary 2.1. Under the assumptions of Theorem 2.1, the singular vectors of A form orthonormal bases for the four fundamentalsubspaces of A. More specifically, write U = [u1,...,um] and V = [v1,...,vn]. Then

range Ran(A)= Ran([u1,...,ur])

kernel Ker (A)= Ran([vr+1,...,vn]) H corange Ran(A )= Ran([v1,...,vr]) H cokernel Ker(A )= Ran([ur+1,...,um])

Corollary 2.2. Under the assumptions of Theorem 2.1, the following equalities hold: 2.2 Singular value decomposition 23

A 2 = σ1 (2.16) 2 2 A F = σ + + σ (2.17) 1 r + 1 H 1 H A = vrur + + v1u1 (2.18) σr σ1 + 1 A 2 = (2.19) σr σ 1 1 κ(A)= if m = n and A− exists. (2.20) σn

The matrix A+ in (2.18) is the Moore-Penrose inverse, or pseudo-inverse, of A. More detailed discussions on the pseudo-inverse will be given in 3.5.1. An imporant special case of Moore-Penrose inverse is §

+ H 1 H A = (A A)− A when rank (A)= n and m n. ≥

2.2.3 Distance to singularity and low rank projections

In theoretical linear algebra, singularity is qualitative: A is either sin- gular or nonsingular, identifiable by its determinant being either zero or otherwise. Singular values, on the other hand, provide precise measurement of the distances to singularities of all ranks, while partial singular value expansions pinpoint where those distances are attainable.

m n Theorem 2.2 (Schmidt-Mirsky Theorem). For every matrix A C × with sin- σ σ σ ∈ gular values 1 2 min m,n , left singular vectors u1,...,umin m,n and ≥ ≥ { } { } right singular vectors v1,...,vmin m,n , wehave { }

min A B 2 = σk+1 and (2.21) rank (B)=k − σ 2 σ 2 min A B F = k+1 + + min m,n . (2.22) rank (B)=k − { } for 1 k < min m,n . Moreover, both minima are attainable at B = Ak, where ≤ { } H H Ak = σ1u1v + + σkukv (2.23) 1 k In particular, the rank-k matrix whose Frobenius norm distance from A equals to the mininum (2.22) is unique if and only if σk > σk+1. Proof. See [26, p. 70]. 24 2 Fundamentals

n For instance, the σn(An) < 10− for the matrix An in (2.11), implying that the n distance from An to a rank-deficient matrix is as tiny as 10− . As will be established later in this monograph, the collection of rank-k matrice m n m n

C C in × form a complex manifold k × of positive codimension (m k)(n k) m n − − if k < min m,n . A matrix in the manifold C × having the minimum distance { } k to A is an orthogonal projection from A to this manifold. Thus, the matrix Ak in (2.23) is a rank-k projection of A if σk = 0. The rank-k projection is also called a rank-k approximation or a low-rank ap- proximation of a matrix.

2.2.4 Distance between subspaces

n The distance between two subspaces S ,T C , denoted by dist (S ,T ), is defined by ⊂ dist (S ,T ) = PS PT 2 (2.24) − where PS and PT are orthogonal projections onto S and T respectively. This distance can be interpretedin several ways. For instance, consider the distance from a vector in one subspace to the nearest vector in the other subspace: Let x S . Then ∈

min x y 2 = x PT x 2 = PS x PT x 2 y T − − − ∈ dist (S ,T ) x 2. (2.25) ≤ From another view point, we can consider the angle θ between x and PT x as the angle between x and PT . The inequality (2.25) also implies

sinθ dist (S ,T ). ≤ Therefore, the distance (2.24) is also known as the sine of the angle between sub- spaces. If columns of the matrices S and T form orthonormal bases for S and T respectively, then H H PS = SS and PT = T T and the distance (2.24) between subspaces S and T could be calculated as it is defined. There is, however, a more convenient way to calculate this distance. To begin, expand S and T to square unitary matrices [S, S ] and [T, T ] respectively. In other words, S and T are matrix blocks whose⊥ columns form⊥ orthornormal ⊥ ⊥ bases for orthogonal complement spaces S ⊥ and T ⊥, respectively. This can be done by, for instance, QR decompositions of S and T respectively. Then the distance between S and T can be calculated using the identities [7, 2.6.3] § H H dist (S ,T ) = S T 2 = T S 2. (2.26) ⊥ ⊥ 2.2 Singular value decomposition 25

We can also define the distance between a vector and a vector space as

dist (x,S ) = dist span x ,S . { } The following lemma will be needed in 3.5.2. § n

Lemma 2.2. Let S and T be subspaces of C . Then there exist matrices S and T whose columns form orthonormal bases for S and T respectively such that S T 2 dist (S ,T ). − ≤ Proof. Let S and T˜ be matrices whose columns form orthonormal bases for S and T respectively. Then columns of T = T˜ T HS form an orthonormal basis for T , and

H H H H S T 2 = (SS T T )S 2 SS T T 2 = dist (S ,T ). − − ≤ −

⊓⊔

2.2.5 Lipschitz continuity of singular value decomposition

The singular value decomposition is generally not attainable in exact form due to Abel’s Impossibility Theorem. Numerical computation is practically necessary for singular value decomposition. For this reason, Lipschitz continuity is essential for all elements of the singular value decomposition. The Lipschitz continuity of singular values and singular subspaces is essential in the well-posedness of the numerical rank and the numerical fundamental subspaces that will be established in 3.3.3. § Theorem 2.3 (Weyl’s Theorem). Singular values are Lipschitz continuous. More ˜ m n specifically, let A,A C × . Then ∈

σ j(A) σ j(A˜) A A˜ 2 (2.27) | − | ≤ − for j = 1,...,min m,n . { } The proof of inequality (2.27) can be found in, say, [26, p. 69]. The stability of singular vectors are measured through the subspaces they span and the variations of those spaces when the matrix is under perturbation. ˜ m n We now consider matrices A,A C × with m n and properly partitioned singular value decompositions ∈ ≥ Σ 1 H A = [U1,U2] [V1,V2] , (2.28) Σ2 Σ˜ 1 H A˜ = [U˜1,U˜2] [V˜1,V˜2] (2.29) Σ˜2 26 2 Fundamentals where U1, U˜1 V1, V˜1, have the same column dimension k, and the blocks

k k Σ1 = diag(σ1,...,σk) C × , ∈ (m k) (n k) Σ2 = diag(σk 1,...,σn) C − × − . + ∈

Following the “singular” terminology here, it is natural to call Ran(U1), Ran(U2), Ran(V1), and Ran(V2) as singular subspaces of A. m n Lemma 2.3 (Wedin’s Theorem [32]). For A, A˜ C × with partitioned singular ∈ value decompositions as in (2.28) and (2.29), assume σk > σk+1 + E 2 with E = A A.˜ Then −

2 H 2 EV1 F + E U1 F H ˜ 2 H ˜ 2 U1 U2 F + V1 V2 F (2.30) ≤ σk σk+1 E F − − Wedin’s Theorem leads to the Lipschitz continuity of singular subspaces. Corollary 2.3 (Singular Subspace Continuity Theorem). Under the assumptions of Wedin’s Theorem, singular subspaces Ran(U1), Ran(U2), Ran(V1) and Ran(V2) are all Lipschitz continuous, satisfying

max dist Ran(Uj),Ran(U˜ j) , dist Ran(Vj),Ran(V˜ j) j=1,2 σ1 √2n A A˜ 2 σ ˜ − (2.31) ≤ σ k+1+ A A 2 A 2 k 1 σ − − k In the upper bound (2.31), the factor √2n is a constant of mod- 1 (σ + A A˜ )/σ − k+1 − 2 k erate magnitude if separation ratio σk/σk+1 of the singular values is somewhat σ significant. The factor 1 can thus be considered a condition number for singular σk subspaces of A with respect to the matrix perturbation E.

2.3 The QR decomposition and its pitfalls

The QR decomposition is one of the basic building blocks in numerical linear alge- bra. Proofs of the existence and uniqueness of QR decomposition can be found in many textbooks, e.g. [7, 5.2]. § m n Lemma 2.4 (QR Decomposition Theorem). For every A C × with m n,

m m ∈ ≥m n C there exist a unitary matrix Q C × and an upper-triangular matrix R × such that ∈ ∈ Rˆ A = QR [Qˆ, Qˆ ] = Qˆ Rˆ (2.32) ≡ ⊥ O where Qˆ and Rˆ consist the first n columnsof Q andthe first n rows of R respectively. If A is of full rank n, then the columns of Qˆ form an orthonormal 2.3 The QR decomposition and its pitfalls 27 basis for the range Ran(A) of A. Furthermore, there exist unique Qˆ and Rˆ where diagonal entries of Rˆ are positive. The matrices Q and R form the full span QR decomposition of A, while A = Qˆ Rˆ in (2.32) is called the thin QR decomposition of A. The standard al- gorithms for QR decompositions such as Householder transformation and Given’s transformation are backward accurate within a small multiple of machine epsilon (c.f. [12, Theorem 19.4, p. 360] and [12, Theorem 19.10, p. 368]) and the backward error can be verified by calculating the residual A QR 2. However, the QR Decomposition Theorem must − be understood with caution. It is impeccably true in theory and in exact computation, but QR decompositions are almost always produced using floating point arithemetic. As a result, the uniqueness in Theorem 2.4 can be dangerously misleading since the QR decompositions are far from unique in practical computations, as shown in the following example. Such forward inaccuracies are not easy to find in standard textbooks. Example 2.10. Consider the upper-triangular matrix

1 1000 1 1000 A =  1 1000   1      8 in single precision (u 5.96 10− ) arithmetic. There are drastically different QR decompositions of A:≈ ×

1 1 1000 1 1 1000 A =  1  1 1000  (2.33) 1 1       1 1 1000 1 0.0000010 1 1000 (2.34) ≈  0.9999995 0.0010000  1 1000.00050000  0 000001 0 0010000− 0 9999995 0 00000001  . . .  .   −   Both QR decompositions in (2.33) and (2.34) are numerically legitimate for the matrix A since the backward errors are about machine epsilon. The diagonal entries of both R-matrices are all positive. The two QR decompositions are fundamentally different in the sense that (2.34) reveals the numerical rank deficiency of A but (2.33) does not. ⊓⊔ As established by G.W. Stewart, the forward inaccuracy of QR decomposition depends on the condition number of the matrix. Theorem 2.4 (Steward’s QR Decomposition Sensitivity Theorem). Let matrices m n A,A˜ C × with QR decompositions A = QR and A˜ = Q˜ R.˜ If A A˜ 2 is sufficiently∈ small, then −

+ R R˜ F (αn A F A F ) A A˜ F (2.35) − ≤ + − Q Q˜ F (βn A F ) A A˜ F (2.36) − ≤ − 28 2 Fundamentals where αn and βn are constants depending on n. Proof. See [25]. ⊓⊔ The error bounds in (2.35) and (2.36) can be quite generous when the condition number of A is large and, in fact, the forward errors in Q and R of the QR decomposition A = QR can be substantial. As an immediate consequence of The- orem 2.4, a straightforward QR decomposition is not reliable in rank revealing for numerically rank deficient matrices since those matrices have large condition num- bers. Additional computations are needed from a QR decomposition to identify the numerical rank and subspaces. Theorem 2.5 (QR Decomposition with Column Pivoting). Forany m n matrix A with rank r < min m,n , the algorithm of Householder QR decomposition× with column pivoting [7, Algorithm{ } 5.4.1, p.249] generates the decomposition

R R AΠ = Q 11 12 (2.37) O O where R11 is an r r upper-triangular matrix with diagonal entries in decreasing magnitudes and Π×is a permutation matrix. Proof. See [7, 5.4.1, p248]. § ⊓⊔ The decomposition (2.37) is sometimes referred to with a questionable term “rank-revealing QR decomposition” in the literature, or mentioned as a remedy for QR decomposition in rank-deficient cases. Without proper warning, readers of The- orem 2.5 may intuitively expect (2.37) to become

R R AΠ = Q 11 12 O R22 with tiny R22 2 0. This is again not guaranteed. A counterexample, known as the Kahan matrix: ≈ s cs cs cs s2 cs2 cs2  . . .  K = .. .. . (2.38)    sn 2 csn 2   − −   sn 1   −    where s > 0, c < 0 and c2 + s2 = 1. The Kahan matrix is already upper-triangular and the Householder QR algorithm with column pivoting does not alter any entry. However, the Kahan matrix can be numerically rank-deficient by virtual of having a tiny singular value and this numerical rank deficiency can not be detected by piv- oted QR decomposition for not having a tiny bottom row. Try n = 100, c = 0.3. A rigorous error analysis by Higham in [11] confirms the forward inaccuracy of Householder QR decomposition algorithm with column pivoting when the condi- tion number of the matrix is large. 2.3 The QR decomposition and its pitfalls 29

In summary, the numerical QR decomposition of a matrix A produced by either Householder transformations or Givens rotations is an exact QR decompositions of a nearby matrix A + E with a tiny backward error E 2 = O(u), and is thus back- ward accurate. As a result of this high backward accuracy,th e QR decomposition is marvelous tool for solving many numerical computational problems such as linear systems and linear least squares. However, it is neither unique nor forward stable in numerical computation, and needs additional computation for reliable identification of numerical rank and subspaces.

2.3.1 A bona fide rank-revealing QR decomposition

Conventional QR decomposition with or without column pivoting are not reliable in detecting matrix ranks in practical computation, and neither should be used for rank-revealing. However, truely rank-revealing QR decomposition does exist in the following theorem.

Theorem 2.6 (Rank-Revealing QR Decomposition Theorem). For every matrix m n σ σ σ A C × with singular values 1 2 min m,n and 0 < r < min m,n , there∈ exists a QR decomposition ≥ ≥≥ { } { }

Rˆ A = [Qˆ,Qˆ ] (2.39) ⊥ S r n ˆ ˆ with a row echelon matrix Rˆ C × along with a unitary matrix Q = [Q,Q ] with σ∈ σ σ σ ⊥ proper partition such that 1 r and r+1 min m,n are singular ≥≥ ≥≥ { } values of Rˆ and S respectively. Furthermore, the blocks Qˆ and Rˆ are unique if σr > σr+1 and the leading entry in every row of Rˆ is positive. In particular,

Rˆ A = [Qˆ,Qˆ ] (2.40) ⊥ O if A isofrank r.

Proof. Let the singular value decomposition of A be partitioned as

Σ Σ H 1 O H 1 V1 A = [U1,U2] [V1,V2] = [U1, U2] H O Σ2 Σ2 V 2 r r with Σˆ C , leading to ∈ × GO Rˆ A = [U ,U ] 1 2 OI S ˆ Σ H Σ H where GR is the row echelon QR decomposition of 1V1 , S = 2V2 , and the theorem follows. ⊓⊔ 30 2 Fundamentals

The simple proof above uses the singular value decomposition. Needless to say, it makes little sense to calculate the rank-revealing QR decomposition (2.39) after obtaining the singular value decomposition. Such a rank-revealing QR decomposi- tion can be computed iteratively after a conventional QR decomposition is obtained. An algorithm will be given in ????. § m n

Corollary 2.4. Let A C be a matrix of rank r with a QR decomposition ∈ × (2.40) and positive singular values σ1(A) σr(A). If ≥≥ R˜ A + E = [Q˜,Q˜ ] ⊥ S is the rank-revealing QR decomposition of the perturbed matrix A + E with r n E 2 < σr and R˜ C . Then ∈ × σ (A) 2 E dist Ran(Qˆ), Ran(Q˜) 1 2 (2.41) σ E ≤ r(A) 1 2 A 2 σr(A) −

Proof. By Weyl’s Theorem, we have S 2 E 2 and σr(A+E) σr(A) E 2. ≤ ≥ − Qˆ H E = Qˆ H (A + E) = Qˆ H Q˜R˜ + Qˆ H QS˜ ⊥ ⊥ ⊥ ⊥ implies

H H H σr(R˜) Qˆ Q˜ 2 Qˆ Q˜R˜ 2 Qˆ E 2 + Qˆ QS˜ 2 = 2 E 2. ⊥ ≤ ⊥ ≤ ⊥ ⊥ As a result, 2 E 1 2 E Qˆ Q˜ Qˆ H Q˜ 2 2 dist Ran( ), Ran( ) = 2 E ⊥ ≤ σr(R˜) ≤ σr(A) 2 1 σ (A) − r and (2.41) holds. ⊓⊔

2.4 Linear least squares

2.4.1 Linear least squares

A linear system Ax = b is an overdetermined form if there are more equations than variables

    m n A x = b , A C × , m > n.   ∈                      2.4 Linear least squares 31

n

Such a “tall” system generically has no conventional solution x C that makes ∈ m Ax b = 0. As illustrated in Fig. 2.2, the range Ran(A) can not fill the space C due− to a positive dimension deficit (i.e. codimension) m rank (A) m n > 0. A generic vector b has a zero probability of falling in Ran−(A). ≥ −

Ax – b b * ^ R (A)

Ax*

R (A) = { A x | x Î Cn }

Fig. 2.2 Illustration of linear least squares.

A least squares solution to the linear system Ax = b is defined as a solution n x C to the minimization problem ∗ ∈ 2 Ax b 2 = min Ax b . (2.42) 2 n 2 ∗ − x C − ∈ A conventional solution, if exists, is obviously a least squ ares solution. It is easy to see from Fig. 2.2 and to prove the following lemma.

m n m C Lemma 2.5. Let A C × and b with m > n and rank (A)= n. The following are equivalent.∈ ∈ (i) x is the unique least squares solution to Ax = b. ∗ (ii) Ax b Ran(A)⊥. (iii) x ∗=− A+∈b. (iv) x∗ is the unique solution to the normal equation (AHA)x = AH b. ∗ (v) Ax = PRan(A)b, where PRan(A) is the orthogonal projection onto Ran(A). ∗ 32 2 Fundamentals 2.4.2 The QR method and sensitivity

A standard numerical method for solving the linear least squares problem is based on the QR decomposition of A.

m n Corollary 2.5. Let A C × with full rank n and the thin QR decomposition A = Qˆ R.ˆ Then the least∈ squares solution x to the linear system Ax = b is the unique solution to the square linear system ∗

Rˆ x = Qˆ H b. (2.43)

ˆ ˆ ˆ ˆ H Proof. By Lemma 2.5-(v), Ax = PRan(A) b. Namely QRx = QQ b. Thus ∗ ∗ (2.43) follows from Qˆ HQˆ = I. ⊓⊔ As a result, finding least squares solution to Ax = b in numerical computation can be accomplished by a thin QR decomposition of A followed by a backward substitution for solving (2.43). In (2.6), the matrix condition number κ( ) is defined for square matrices. A straightforward generalization of κ( ) for m n matrices can be given as × σ + 1(A) σ A 2 A 2 σ if n(A) > 0 κ(A) = ≡ n(A) (2.44) ∞ otherwise m n for A C × with m n. ∈ ≥ In practical computation, the data A and b are assumed to be imperfect with perturbed representation A˜ and b˜, respectively. The the least squares solution x drifts to x˜. The a priori error estimate on x x˜ 2 is given in the following theorem.∗ After solving the least squares problem ∗ − for an approximate solution xˆ with a know residure, the following theorem also provides an a posteriori estimate on x xˆ 2. ∗ − ˜ m n Theorem 2.7 (Least Squares Sensitivity Theorem). Let A, A C × with ˜ ˜ m + ∈ ˜+ ˜ µ = κ(A) A A 2/ A 2 < 1. Assume b, b C , x = A b and x˜ = A b. Then − ∈ ∗ ∗

κ(A) A A˜ 2 b b˜ 2 x x˜ 2 2 x 2 − + − ∗ − ∗ ≤ 1 µ ∗ A 2 A 2 − ˜ 2 A A 2 Ax b 2 +κ(A) − ∗ − (2.45) A 2 A 2 Furthermore, any approximate least square solution xˆ of Ax = b with residual Axˆ b = r is the exact least squares solution of Ax = b + PRan(A) r where PRan(A) − m is the orthogonal projection from C onto Ran(A), and the forward error

PRan(A) r 2 x xˆ κ(A) . (2.46) ∗ − ≤ A 2 2.4 Linear least squares 33

Proof. See [2, Theorem 1.4.6] for (2.45). Let columns of Q form an orthonormal H basis for Ran(A). Then PRan(A) = QQ and there is a matrix S such that A = QS. Consequently

AHAxˆ = AHb + AHr = AHb + SHQHr = AHb + SHQHQQHr = AH(b + QQHr) H = A (b + PRan(A) r)

Therefore, x is the exact least squares solution of Ax = b + PRan(A) r. Substituting b˜ = b + P r and A˜ = A into (2.45) yields (2.46). Ran(A) ⊓⊔ For the special case that the least squares solution is the conventional solution, the residual Ax b 2 = 0 and the last term of the error bound (2.45) vanishes. After obtaining a∗ numerical− solution xˆ using the thin QR decomposition A = QR of A, the columns of Q form an orthonormal basis for the range Ran(A) of A and the inequality (2.46) provides a convenient bound on the forward error using H the residual Axˆ b 2 and PRan(A) r 2 = Q r. The sensitivity − inequality (2.45) is essentially the same as in [2, Theorem 1.4.6] where the equivalent inequality accentuates the mainstream condition number + A 2 A 2. From (2.45), the sensitivity of least squares solution is well measured + + by A 2 in a much simpler form. We perfer to use A 2 as the condition number for the least squares problem Ax = b.

2.4.3 Polynomial division done right

Polynomial long division long division has to be one of the oldest and basic alge- braic operations. It is an indispensive component of many classical computational algorithms such as Euler’s Algorithm for polynomial greatest common divisors, Routh-Hurwitz Stability Criterion in control theories and Bachberger Algorithm for Gr¨obner Basis, etc. However, it is rarely mentioned in the literature of numeri- cal analysis that long division is almost hopelessly inaccurate in computation using floating point arithmetic. Finding the least squares division as a stable alternative is one of crucial step in [36] in removing the singularities in root-finding. The highly ill-conditioned matrix An in (2.11) in Example 2.9 on page 21 for even modest n arises in the long division p(x) (x + 10) for calculating the quotient q(x) and the remainder r ÷

n n 1 p0 + p1x + + pnx = (10 + x)(q1 + q2x + + qnx − )+ r (2.47) which is equivalent to the linear system Anx = b, namely 34 2 Fundamentals

1 10 1 10 r p0  . .  q1 p1 .. ..       . = . . (2.48)   . .  .  . .  ..   10         qn   pn   1            As elaborated in Example 2.9, the condition number of An explodes exponentially as n increases, making the resulting quotient and remainder practically useless for n as moderate as 10. A reliable remedy is apparently still unknown on general long divisions. The huge condition numberof An implies that the matrix An is numerically singular and thus the long division problem like (2.47) is underdetermined. Accurate numerical computation of q1,...,qn,r may necessarily require further constraints. These further constraints would turn the eventual linear system into an overdetermined form by adding equations to (2.48) or by deleting variables. There is a the special long division case in which the remainder is known, say r = 0 in (2.47). With this additional constraint, all of a sudden the long division can be transformedto an extremely stable least problemof the overdeterminedlinear system 10 1 10 q1 p0  .  .. . .  1  . = . (2.49)    .   .   .   ..  q p  10  n n        1        11 with a condition number below 9 , which is nearly perfect! In general, the special case long division problem can be stated as follows.

Problem 2.4. Given polynomials f and g, find the quotient q such that

g = f q (2.50) Polynomials of complex coefficients with degrees up to n form a vector space

n Pn = p(x)= p0 + p1x + pnx p0,..., pn C (2.51) { | ∈ }

over the field C. For any fixed polynomial

m f (x) = f0 + f1x + + fmx Pm ∈ we have a linear transformation L f : Pn Pm+n with −→

L f (p) = f p, p Pn, (2.52) ∀ ∈ which corresponds to an (m + n) n matrix associated with the the specific bases × for Pn and Pm+n. n m+n Using the standard monomial bases 1,x,...,x and 1,x,...,x for Pn { } { } and Pm+n respectively, the linear transformation L f corresponds to a convolution 2.4 Linear least squares 35 matrix n+1

f0 ..  f1 .  Cn( f ) = . . . (2.53)  . .. f   0   f f   m 1   . .   .. .     f   m    As a result, the polynomial equation f q = g becomes an overdetermined linear system Cn( f ) q = g for q Pn (2.54) ∈ where, for every polynomial p, the notation p represents the coefficient vector of p of proper dimension. J K J K In general, for given polynomials f andJ gK where f divides g approxi- mately, the following Least Squares Division Algorithm is far more stable than the conventional long division.

Algorithm 2 (LeastSquaresDivision)

INPUT: polynomials f and g – extract m = deg( f ) and n = deg(g) – form the convolution matrix Cn m( f ) – form the coefficient vector JgK −of g – solve the linear system Cn m( f )q = g for the least squares solution q – form the polynomial q from− its coefficient vector q

OUTPUT: polynomial q such that f q = g.

Fig. 2.3 Algorithm LEASTSQUARESDIVISION

From the Least Squares Sensitivity Theorem 2.7, the accuracyof the least squares + solution to (2.54) depends on Cn( f ) 2, which is generally moderate by the following theorem.

Theorem 2.8 (Polynomial Least Squares Division Theorem). Let f beamonic polynomial that divides a polynomial g with quotient polynomial q of degree n. Then z = q is the least squares solution of the linear system

C ( f )z = g J K n with condition number J K 36 2 Fundamentals

m 1 + n+k 2 − Cn( f ) 2 ∏ 1 + 2 zk cos π + zk (2.55) ≤ | | n+k+1 | | k=1 where z1,...,zm arerootsof f. If z j = 1 for j = 1,...,m, then | | m + 1 Cn( f ) 2 < ∏ . (2.56) 1 zk k=1 − | | Proof. Since f (x) = (x z1) (x zm), hence − −

Cn( f ) = Cn+m 1(x zm) Cn+1(x z2)Cn(x z1). − − − − For any integer k 0 and constant z C, the tridiagonal matrix ≥ ∈ 1 + z 2 z | | z 1 + z 2 z  | |  H . . . Ck(x z) Ck(x z) = ...... − −    z 1 + z 2 z   | |   z 1 + z 2   | | (k+1) (k+1)   × has exact eigenvalues (c.f. [9, Ex. 7.4, p. 137])

2 jπ λ j = 1 + z + 2 z cos for j = 1,...,k + 1, | | | | k+2 whose square roots are singular values of Ck(x z). Then all the inequalities can be obtained from these singular values. − ⊓⊔ + If the condition number Cn( f ) 2 Cn( f ) 2 is needed, the inequality m m π 2 Cn( f ) 2 ∏ 1 + 2 zk cos + zk < ∏(1 + zk ) (2.57) ≤ | | n+k+1 | | | | k=1 k=1 follows from the proof and can be applied. Theorem 2.8 and its proof leads to the following lemma that is useful in later chapters.

Corollary 2.6. For any monic polynomials f Pm and g Pn, ∈ ∈ m π fg 2 ∏ sin g 2 (2.58) ≥ k=1 n + k + 1 J K J K Proof. Let z1,...,zm C be roots of f . By (2.55), we have ∈ m n + k 2 fg ∏ 1 + 2 zk cos π + zk . 2 ≥ | | n + k + 1 | | k=1 J K 2.5 Nonlinear least squares 37

It is elementary to prove π n + k 2 2 1 + 2 zk cos π + zk sin | | n + k + 1 | | ≥ n + k + 1 and thus (2.58) follows. ⊓⊔ Except in the cases where the divisor f (x) has a root of unit magnitude, the condition number of the convolution matrix Cn( f ) has an upper bound independent of n. Even if f (x) has a root of unit magnitude, the condition number of Cn( f ) is bounded by a polynomial of n. In comparison, the condition number of long division grows exponentially. A simple transition from long division to least squares division makes a tremendous difference. Exercise 2.1. A conjecture. For any polynomial f , thereisa τ > 0 such that

+ Cn( f ) τn f . 2 ≤ 2 Exercise 2.2. An open ended research problem. After finding an approximate root J K x = z˜ of a polynomial f , there have been studies of deflation strategy in solving the equation f (x) g(x) := = 0 x z˜ − for other roots. From the analysis in 2.4.3, there is an alternative deflation strat- egy in applying the least squares division§ of f by x z˜ and solving the quotient q(x)= 0 instead for other roots. Study the advantages− and disadvantages of both deflation strategies. Under what conditions is the alternative deflation strategy reli- able?

2.5 Nonlinear least squares

2.5.1 Overdetermined nonlinear systems

We now consider an overdetermined nonlinear system in the form of

n m C f(x) = b, for x C , and b (2.59) ∈ ∈ with m > n. Here

f1(x1,...,xn) x1 . . f(x) =  .  and x =  . . (2.60) f (x ,...,x ) x  m 1 n   n     

n m C Overdeterminednonlinear systems arising in this monograph involve f : C given as a holomorphic mapping. Here, A complex valued function f of a−→ single Chapter 3 Singularities of Matrix Rank Deficiency

Matrix rank deficiencies almost always lead to computational challengies, such as ill-posed problems of rank-revealing and singular linear systems. Near rank defi- ciency seems to be even worse in implying ill-conditioning. The prospect of com- putation, however, may not be as bleak as it appears to to be when a matrix is (or is near) singular. What we need may be a different perspective on the problem and changes in computing strategies. As this chapter will elaborate, exact rank and subspaces are singular but numer- ical rank and numerical subspaces are not; singular linear systems are infinitely sensitive in general but perfectly well-posed on a manifold; ill-conditioned linear systems may not be so at all when proper solutions are considered. For the convenience of exposition, we discuss “tall” matrices, namely matrices of m n with m n throughout this chapter. There is no real difficulty in under- standing× or computing≥ numerical ranks of “flat” matrices. For the same reason, we omit the discussion of numerical corange and numerical cokernel.

3.1 Food for thought: How to solve Ax = 0 numerically?

The answers are not as simple as the question may sound. It is a safe bet that the answers can not be found in general numerical analysis textbooks, and they are well-hidden in textbooks on numerical linear algebra. Standard software such as linsolve in Matlab produces only the trivial solution x = 0, as does Maple’s LinearSolve in the mode of floating point arithmetic. This question has been a subject of study since the early days of computer era with an elegant answer in singular value decomposition and quite a few mature numerical algorithms. Unfortunately, the theory and answers have not been dis- seminated beyond the circle of experts and specialists. This question has been a subject of study since the early days of computer era with an elegant answer in singular value decomposition and quite a few mature

63 64 3 Singularities of Matrix Rank Deficiency numerical algorithms. Unfortunately, the theory and answers have not been dis- seminated beyond the circle of experts and specialists. First of all, the standard method in linear algebra textbooks doesn’t work in nu- merical computation since reduction of a rank-deficient matrix to eschelon form only works in symbolic computation for matrices with exact entries.

Example 3.1. The following matrix

1 1 100001 1 2 200000 200000 0 −1 − − 0 100000 1 00 A =  1 1 1 499999 0  − 2 120000 600000  1 1 1 19999 0   − 2 − 600000 − 120000   1 1 1 100001 1   2 600000 600000   − − −  is numerically rank deficient by virtue of its smallest singular value

16 σmin(A) 8.73 10− . ≈ × The system Ax = 0 has a set of numerical solution

0.447213595482069 0.894427190964139

x = α 0.000008944271910  for any α C (3.1) −  0.000000000089443  ∈  0 000000000000000   .    16 that is accurate with backward error 9.5 10− . However, row operations with floating point arithmetic can not produce× a row eschelon form with small bottom rows even if the hardware precision is extended. An LU decomposition with partial pivoting implemented in Matlab produces

A = LU = 100 00 1 .50000 .000005 .500005 0 010 00 0 −.00001− 1− 0 0  101 00  0 0 .000013333333333 1.333336666666667 0  00 00 0000125 0  1 0 .25 1 0  .   1 0 .25 .199999999998224 1  00 0 0 1        with no approximately zero rows. Only the trivial solution x = 0 can be obtained. ⊓⊔ The elementary row operations taught in linear algebra courses can not be relied on to solve Ax = 0 numerically, as shown in Example 3.1. The QR decomposition isn’t foolproof either, as shown in the following example.

Example 3.2. Let 3.2 Geometry of ill-posed rank-revealing 65

1 1 1 100001 1 − 2 − 200000 − 200000 − 2 0 1 1 0 0  100000  A = 1 1 1 499999 1 − 2 120000 600000 − 6  1 1 1 19999 1   − 2 − 600000 − 120000 − 6   1 1 1 100001 5   2 600000 600000 6   − − −  The smallest singular value

16 σmin(A) 8.95 10− ≈ × The numerical solution of Ax = 0 is the same as given in (3.1). Using Matlab, we obtain an approximate QR decomposition A = QR with

R = 22 2 1 8.47 10− 0 0 −0 0.00001− × 1 0 0 Rˆ r 17 0 0 0.00001 1 2.117582368135751 10− =  0 0− 0 1.000000000003118 10−5 2.117582368129149 ×10 12  0⊤ 1 − × − − × −  000 0 1    The lower right corner entry of R is not close to zero as many people might expect ˆ u it to be. After solving the equation Ru = r for u and normalizing 1 as the vector − 0.447213555589289 0.894427111178578 y 0.000008944271112 ≈  −0.000000000089443  0.000422381261080  −  which is far from the numerical solution (3.1) with a poor backward accuracy

Ay 2 > 0.0004. ⊓⊔

3.2 Geometry of ill-posed rank-revealing

Rank, range and kernel are fundamental and elementary in linear algebra and its applications. Identifying ranks and fundamental subspaces are refered to as the rank-revealing problem. Similar to many other algebraic problems, rank-revealing is ill-posed. No matter how tiny in magnitude, almost all perturbations alter the theretical rank, range and kernel completely. The root of this singularity is in its geometry. Following the intuitive elaboration on the geometry of 5 4 matrices in 2.6.3, it can be similarly proved that the subset × §

m n m n C × A C × rank (A)= k (3.2) k ≡ ∈ 66 3 Singularities of Matrix Rank Deficiency

m n is a complex manifold in C × with a dimension k(m + n k) and codimension C m n − (m k)(n k), as illustrated in Fig. 3.1. We call k × the k-th rank manifold in m−n − C × .

n

dimension count: • • • • ←− • k (m + n k)  • • • • − m  • • ◦ ◦  m k codimension count:  −  • • ◦ ◦ ←− ◦ (m k)(n k)  • • ◦ ◦ − −     k n k − C m n Fig. 3.1 Dimension and codimension of the rank manifold k × .

we summarize the geometry of m n matrices in the following theorem. × Theorem 3.1 (Rank Manifold Theorem). Let m n k 0 be integers, the following assertions hold. ≥ ≥ ≥ m n m n

C C 1. The subset k × is a complex manifold in × . C m n 2. codim k × = (m k)(n k). − n − m n m n

C C 3. The space × = k × is topologically stratified in the sense that k=0 and every rank manifold is embeded in the closure of rank manifolds of lower codimensions. m n m n 4. Full rank matrices form Cn × that is open and dense in C × . m n m n

C C 5. For every matrix A × , itsrankequalsto k if andonly if k × is the highest codimension∈ rank manifold that is of zero distance to A, namely

rank (A) = max l. C m n dist2(A, l × )=0

m n 6. Assume A C × , there are ε, δ > 0 such that ∈ k m n rank (A) = min l dist2 A˜, C × < ε (3.3) | l m n δ for every A˜ C with A˜ A 2 < . ∈ × − Using the geometry of the matrix ranks, the ill-posedness of rank-revealing can easily be explained: A matrix is of rank k < n if and only if it resides on the C m n manifold k × of positive codimension. Due to the dimension deficiency, the C m n C m n manifold k × is of measure zero and, for any matrix A k × , an arbitrary perturbation E generically pushes A to A + E that is no longer∈ on the manifold 3.3 The numerical rank 67

C m n C m n k × . Since the full-rank matrices form the manifold n × that is of codimension m n zero and is open dense in C × , almost all perturbations make A+E fully ranked. As a result, the (exact) rank of A seems to be hopelessly lost forever as the rank of the data A˜ = A + E. It is the equality (3.3) that enables the recovery of the rank of A using the imper- fect data A˜ when the perturbation E 2 is sufficiently small. The rank manifold C m n Cm n C m n ε k × where A resides as well as k+×1 ,..., n × are in an -neighborbood ˜ C m n of the data A. However, the k × distinguishes itself as the one of the highest C m n C m n codimension amoung k × ,..., n × . Finding the rank of the underlying matrix A is equivalent to identifying the highest codimension manifold near the data A˜, and is also equivalent to finding the minimum rank among all the matrices in the ε-neighborhood of the data A˜.

3.3 The numerical rank

Rank-revealing computations, particularly numerical computations, do not seem to be well explained in the textbooks or literature even though almost all the theories are readily available due to the singular value decomposition. As we shall elaborate in this chapter, the rank, range and kernel can still be computed accurately using floating point arithmetic even if the data are imperfect, because the numerical rank, range and kernel are Lipchitz continuous with respect to the data. As we shall also see in later chapters, numerical rank-revealing frequently appear in other ill-posed problems during the solution processes. Even though the singu- lar value decomposition is the most reliable and accurate numerical rank-revealing method, it is far from efficient when either the numerical rank or the numerical nul- lity is low. Althernative algorithms will be constructed for practical computations.

3.3.1 Why numerical rank is indispensible. An example

In applications, a scenario arises frequently: We are given a matrix A as our data, knowing it came from a matrix Aˆ whose rank is needed but Aˆ itself is unavailable. We must use the imperfect data A to calculate the rank of Aˆ. The conventional exact rank of A is useless due to the ill-posedness of matrix ranks explained in 3.2. The perturbation from Aˆ to A already altered the theoretical rank completely.§ Floating point arithmetic doesn’t seem to help by compounding the perturbation with round-off errors. We need the notion of numerical rank along with numerical range, numerical kernel, etc, that are well-posed, and we can calculate the numerical rank of the data matrix A so that the desired (exact) rank of Aˆ is possibly recovered. As we shall elaborate, the numerical rank, range, kernel, etc, generalize the con- ventional notions of rank, range, kernel, etc, respectively. 68 3 Singularities of Matrix Rank Deficiency

Example 3.3. Two polynomials, say,

2 3 2 3 4 f = f0 + f1x + f2x + f3x , g = g0 + g1x + g2x + g3x + g4x , have a greatest common divisor of degree 2, if and only if their Sylvester matrix

f0 g0 f1 f0 g1 g0  f2 f1 f0 g2 g1 g0  S( f ,g) =  f3 f2 f1 f0 g3 g2 g1     f3 f2 f1 g4 g3 g2   f f g g   3 2 4 3   f g   3 4    2 is of nullity 2 exactly. Moreover, let u = u0 + u1x + u2x be a greatest common divisor of f and g, then we can write f = u v and g = u w where 2 v = v0 + v1x, w = w0 + w1x + w2x with f0 g0 w0 f1 f0 g1 g0 w1  f2 f1 f0 g2 g1 g0  w2   f3 f2 f1 f0 g3 g2 g1  0  = 0 (3.4)     f3 f2 f1 g4 g3 g2  v0   f f g g  −v   3 2 4 3  1   f g  −0   3 4      In fact, the equation (3.4) is matrix-vector form of the identity

f w g v uvw uwv = 0. − ≡ − As a result, polynomial GCD can be computed by identifying the rank and kernel of S( f ,g) followed by extracting cofactors v and w from the kernel and u = f /v. In practice, however, the data f and g came from the underlying (and hidden) polynomials fˆ andg ˆ with perturbations presumably small. Then S( f ,g) is a small perturbation from S( fˆ,gˆ). Due to the ill-posedness, the conventional rank of S( f ,g) is meaningless as far as gcd ( fˆ,gˆ) is concerned, so is the rank of S( fˆ,gˆ) since fˆ andg ˆ are not available for computation. To identify the degree of gcd ( fˆ,gˆ) using the available data f and g, we need a properly formulated numerical rank of the matrix S( f ,g) that, hopefully, shed light on the desired degree of gcd ( fˆ,gˆ). Furthermore, we need to compute the numerical kernel of S( f ,g) that approximates the (exact) kernel of S( fˆ,gˆ). It is desireable to have an accuracy of the numerical kernel of S( f ,g) as an approximation to the exact kernel of S( fˆ,gˆ) to be in the order of

O ( f ,g) ( fˆ,gˆ) = O S( f ,g) S( fˆ,gˆ) 2 − − that the data deserve. ⊓⊔ 3.3 The numerical rank 69 3.3.2 The notions of numerical rank, range and kernel

We can now give a precise problem statement for numerical rank-revealing.

Problem 3.1 (Numerical Rank-Revealing Problem). Let A be a given matrix that represents the available data of a matrix Aˆ with a small error. Find the numerical rank of A that is idential to the (exact) rank of Aˆ and, if needed, find the numerical range or numerical kernel of A that approximate the (exact) range or kernel of Aˆ respectively to an accuracy of O( A Aˆ 2). − The formulation of the numerical rank/range/kernel must conform to the com- mon backward accuracy requirement that a numerical solution is the exact solution of a nearby problem. Namely, the numerical rank/range/kernel of A is the exact rank/range/kernel of a nearby matrix A˜. Since the data A is presumably very close ˆ C m n to its underlying matrix A, the “point” A is near the rank manifold k × as C m n C m n well as those rank manifolds k+×1 ,..., n × in the strata whose closures contain C m n θ C m n k × . With a proper theshold > 0, the desired manifold k × is of the highest ˜ C m n codimension amoung all those rank manifolds. There is a unique matrix A k × such that ∈ A A˜ F = min A B F. − B C m n − ∈ k × The rank/range/kernel of A˜ is the numerical rank/range/kernel of A, as a formal definition given as follows.

m n Definition 3.1 (Numerical Rank). For a matrix A C × and a numerical rank threshold θ > 0, assume the singular values ∈

σ1(A) σk(A) > θ > σk 1(A) σn(A). (3.5) ≥ ≥ + ≥ ≥ and denote Aθ as the rank-k projection of A. Then the numerical rank of m n θ A C within , denoted by rank θ (A), is k. Namely ∈ × rank θ (A) = rank (Aθ ) (3.6)

The numerical range and numerical kernel of A within θ are

Ranθ (A) = Ran(Aθ ) and Ker θ (A) = Ker (Aθ ) (3.7) respectively.

In other words,

Ranθ (A) = span u1,...,uk and Ker θ (A) = span vk 1,...,vn { } { + } H H if A = σ1u1v + + σnunv is a singular value expansion of A. 1 n When rank θ (A)= k, wealsosaythe numerical nullity of A within θ is n k, denoted by nullity (A)= n k where n is the column dimension of A. − θ − 70 3 Singularities of Matrix Rank Deficiency

Choosing the numerical rank threshold θ in Definition 3.1 is application de- pendant. The purpose of numerical rank is to identify the (exact) rank of the hidden matrix Aˆ that is known only through its perturbed data A. By Weyl’s Theorem (Theorem 2.3, page 25), there is a “window of opportunity”, namely the interval

σk(Aˆ) A Aˆ 2 > θ > A Aˆ 2 (3.8) − − − for choosing θ so that k = rank Aˆ can be recovered, assuming the data error A Aˆ 2 is sufficiently small. − The lower bound for θ, i.e.thedataerror A Aˆ 2, shouldbeknownroughlyor − precisely from the underlying application. It is the upper bound σk(Aˆ) A Aˆ 2 − − that generally unknown. Furthermore, this unknown σk(Aˆ) is also the reciprocal of the condition number of the numerical range/kernel by the Singular Subspace Con- tinuity Theorem (Corollary 2.3, page 26). When σk(Aˆ) is of moderate magnitude, 5 16 say σk(Aˆ) > 10− , while the data error A Aˆ 2 is near hardware precision 10− , − 6 there is a quite a large “window” for setting threshold, say anywhere between 10− 16 and 10− , and the numerical range and numerical kernel have an accuracy around 10 correct digits. Therefore, we can only offer a heuristic guideline for choosing the threshold θ: slightly larger than the upper bound of the data error A Aˆ 2. − Example 3.4. Consider the Sylvester matrix in Example 3.3 in 3.3.1. Let f and g be monic polynomials with §

1.0000 0 0 0 1.0000 0 0 3.0000 1.0000 0 0 1.0000 1.0000 0  1.4142 3.0000 1.0000 0 4.5558 1.0000 1.0000  S( f ,g) =  4.2426 1.4142 3.0000 1.0000 1.4142 4.5558 1.0000     0 4.2426 1.4142 3.0000 4.4429 1.4142 4.5558   0 0 4.2426 1.4142 0 4.4429 1.4142     0 0 04.2426 0 0 4.4429      with only the known correct digits being displayed. In other words, the coefficient- wise error bound of the polynomial f and g is 0.0001. The data error bound is thus (notice there are at most 24 inexact entries)

S( f ,g) S( fˆ,gˆ) S( f ,g) S( fˆ,gˆ) 0.0001√24 0.0004899. − 2 ≤ − F ≤ ≈ The numerical rank threshold can be set slightly larger than 0.0004899, such as θ = 0.0005. Using Matlab, the singular values of S( f ,g) are as follows.

>> svd(S)

ans =

12.299510598797836 7.103226902802290 6.085904415876807 3.707506460721927 0.982213088705758 0.000013258795790 0.000009005947065 3.3 The numerical rank 71

The numerical nullity nullity (S( f ,g)) = 2 is identical to the degree of the 0.0005 greatest common divisor of the hidden polynomial fˆ andg ˆ under the assumption that the data error below 0.0001 is small enough for these two polynomials. On the other hand, there is a reality check here: An error tolerance exists for every specific numerical rank revealing problem. The rank of the hidden matrix may be hopelessly lost when the data error exceeds that tolerance. ⊓⊔

3.3.3 Properties of numerical rank, range and kernel

The following theorem asserts that for every matrix Aˆ, there is a neighborhood containing Aˆ. Aslongasthe datamatrix A representing Aˆ is in that neighborhood, there is window of opportunity, namely an interval A Aˆ 2 < θ < εA, such that the numerical rank-revealing problem is accurately solvab − le.

Theorem 3.2 (Fundamental Theorem of Numerical Rank). Every m n matrix m n × Ω Aˆ belongs to an open neighborhood Ω in C and, for every matrix A , × ∈ there is an εA > 0 such that rank θ (A), Ranθ (A) and Ker θ (A) uniquely exist and are Lipschitz continuous. Furthermore,

rank θ (A)= rank Aˆ (3.9)

dist Ranθ (A),Ran(Aˆ) = O( A Aˆ 2) (3.10) − dist Ker θ (A),Ker(Aˆ) = O( A Aˆ 2) (3.11) − whenever A Aˆ 2 < θ < εA. − Proof. Let Aˆ be of rank k, namely σk(Aˆ) > 0 and σk+1(Aˆ)= = σn(Aˆ)= 0. m n σ ε σ Set Ω = B C × B Aˆ 2 < k(Aˆ)/2 and A = k(A)/2. By Weyl’s Theorem (Theorem{ ∈ 2.3,|page − 25), for every A} Ω, wehave ∈

σk(A) σk(Aˆ) A Aˆ 2 > σk(Aˆ)/2 > θ ≥ − − σk 1(A) A Aˆ 2 < θ. + ≤ − and (3.9) holds. The inequalities (3.10) and (3.11) follow from Corollary 2.3 (page 26). ⊓⊔ The Fundamental Theorem of Numerical Rank also implies that the numerical rank, range and kernel are consistent with their conventional exact counterpart as their generalizations.

Corollary 3.1 (Numerical Rank Consisitency Theorem). Rank, range and kernel are special cases of numerical rank, numerical range and numerical kernel respec- tively. More precisely, for every matrix A, there is a δ > 0 such that

rank (A)= rank θ (A), Ran(A)= Ranθ (A), Ker(A)= Ker θ (A) 72 3 Singularities of Matrix Rank Deficiency for θ (0,δ). ∈ The following corollary is different version of the Singular Subspace Continuity Theorem (Corollary 2.3, page 26).

Corollary 3.2 (Numerical Range/Kernel Sensitivity Theorem). Let A beama- trix with rank θ (A)= k. Then

σ1(A) κθ (A) = (3.12) σk(A)

∆A 2 is a condition number of Ranθ (A) and Ker θ (A) w.r.t. the perturbation A . 2 Needless to mention that the condition number κθ (A) can be substantially smaller than the conventional condition number κ(A) in (2.6) and (2.44) for any matrix A with a deficient numerical rank. We shall refer to κθ (A) as the condition number of A w.r.t. θ.

3.4 A simple numerical rank-revealing algorithm

In general, the most reliable and accurate numerical rank-revealing algorithm is the singular value decomposition. In some particular applications, however, there are more efficient alternatives that are as reliable and accurate. For instance, when a large matrix is known to have a small rank-deficiency numerically, the numerical kernel can be computed directly. In this section we shall present one of those special case rank-revealing algorithms with a focus on numerical kernel computation.

3.4.1 The inverse power iteration

m n H Consider a matrix A C × . The matrix A A is a with nonnegative eigenvalues.∈ Singular values of A are nonnegative square roots H H of A A. It is easy to verify that the numerical kernel Ker θ (A)= Ker θ 2 (A A). H Let λ1 λ2 λn be eigenvalues of A A with associated eigenvectors ≤ ≤≤ n z1,z2,...,zn, respectively, forming an orthonormal basis for C . Those eigen- vectors are acutually the right singular vectors of A. n α α Randomly select a vector x0 C and write x0 = 1z1 + + nzn. The inverse power iteration ∈

H 1 x j+1 = (A A)− x j, j = 0,1,2,... (3.13)

∞ generates a sequence x j where { } j=0 3.4 A simple numerical rank-revealing algorithm 73

α1 α2 αn x j = z1 + z2 + + zn = xˆ j + xˇ j. λ j λ j λ j 1 2 n with

α1 αk xˆ j = z1 + + zk Ker θ (A), λ j λ j ∈ 1 k αk+1 αn xˇ j = zk 1 + + zn Ker θ (A)⊥ λ j + λ j ∈ k+1 n assuming nullity (A)= k. As a result, using r = rank θ (A)= n k + 1, θ −

xˇ j 2 xˇ j 2 dist (x j,Ker θ (A)) ≤ x j 2 ≤ xˆ j 2 j 2 2 λk αk 1 + + αn | + | | | λ 2 2 ≤ k+1 α1 + + αk | | | | σ 2 j r+1(A) xˇ0 2 = (3.14) σr(A) xˆ0 2 converges to zero as j ∞ since σr(A) > θ > σr 1(A). Namely, the vector → + sequence x j converges into the numerical kernel Ker θ (A). Heuristically,{ } it is reasonable to expect that σ xˇ0 2 r+1(A) 3 1 and 10− , xˆ0 2 ≈ σr(A) ≤ and thus 18 dist (x3,Ker θ (A)) 10− , ≤ below the hardware precision. Consequently, it is usually unnecessary to continue the inverse power iteration (3.13) beyond three steps to obtain an accurate kernel vector of A. Notice that it is unnecessary for x j to converge to any specific singular vector. { } Let A = QR be the QR decomposition. It is easy to see that A and R have iden- H 1 tical numerical rank and kernel. The inverse power iteration step x j+1 = (A A)− x j H can be carried out by solving R y = x j and then Rx j+1 = y. The following pseudo-code describes the algorithm for computing a numerical kernel vector of A within θ

3.4.2 The “exaggerated fears” of inverse iteration

The inverse (power) iteration in 3.4.1 requires solving a highly ill-conditioned lin- ear system RHRx = y, where the§ matrix R from the QR decomposition of the numerically singular matrix A is known to be inaccurate as shown in 2.3. It is § 74 3 Singularities of Matrix Rank Deficiency

Algorithm 4 (NUMERICALKERNELVECTOR) m n θ INPUT: upper triangular matrix R C , rank threshold ∈ × n – generate a random unit vector x0 C . – for j = 1,2,3 do ∈ H · solve R y = x j 1 by forward substitution · solve Rz = y by− backward substitution · set µ = y 2/ z 2, z = z/ z 2, and x j = z. end do

OUTPUT: kernel vector z if µ < θ

Fig. 3.2 Algorithm NUMERICALKERNELVECTOR natural to question the wisdom of using inverse iteration. In [21], Peters and Wilkin- son described what they called “exaggerated fears” in the early days when inverse iteration for solving (A λI)x = 0 at a given approximate eigenvalue λ of A was proposed: −

Inverse iteration is usually attributed to Wielandt (1944) though a number of people seem to have had the idea independently. Although (it is) basically a simple concept its numerical properties have not been widely understood. If λ really is very close to an eigenvalue, the matrix (A λI) is almost singular and hence a typical step in the iteration involves the solution of− a very ill-conditioned set of equations. Indeed if λ is an exact eigenvalue A λI is exactly singular, though in numerical work involving rounding errors the distinc- tion− between “exact” singularity and “near” singularity is scarcely significant. The period when inverse iteration was first considered was notable for exaggerated fears concerning the instability of direct methods for solving linear systems and ill-conditioned systems were a source of particular anxiety. [Few] numerical analysts discuss inverse iteration with any confidence.

It is interesting and counter-intuitive that the hyper sensitivity of the linear system does not make inverse iteration unstable. This surprising fact appears to be never mentioned in numerical analysis textbooks and never explained in general numerical linear algebra textbooks, As Parlett described in his book [20] on eigenvalues,

This result is alarming if we had hoped for an accurate solution of (A σ) 1y = x (the − − means) but is a delight in the search for z j (the end).

3.4.3 Computing the numerical kernel

Let z1,...,zn r be an orthonormal basis for Ker(A) of rank r. If z1,...,zk { − } { } is known, clearly the remaining basis vectors zk+1,...,zn r span the solution space of the equations − 3.4 A simple numerical rank-revealing algorithm 75

zHx = = zHx = 0, Ax = 0. 1 k This idea can be extended to computing the numerical kernels after some numerical kernel vectors are computed. The Algorithm NUMERICALKERNELVECTOR calculates a single vector in the numerical kernel. Assuming an orthonormal set z1,...,zk Ker θ (A) has been obtained, we need to calculate a kernel vector y, if{ it exists,}⊂ so that is orthogonal to all z1,...,zk. The following lemma is the foundation for computing the remaining numerical kernel vectors.

m n Lemma 3.1. Let A C with m n. Assume rank θ (A)= r and z1,...,zk ∈ × ≥ { } is an orthonormal subset of Ker θ (A). Let Zk = [z1,...,zk] and µ > θ. Then the matrix µZH A = k (3.15) k A has a numerical rank r k within θ with Ker θ (A)= Ker θ (Ak) Ran(Zk). − Proof. It suffices to prove the lemma for k = 1 since we can apply the result recursively. Let A = UΣV H be the singular value decomposition of A with right singular vectors v1,...,vn. For any unit vector z Ker θ (A), we can write ∈ z1 = ρr 1vr 1 + + ρnvn. Then + + µ H 1 z1 H V = U A

0 0 µρr 1 µρn σ1 + σ1 . ..  ..    . σr σ µρ µρ  r   r+1 n   σ   σ  =  r+1  = P r+1       ..   ..   .   .   σ   σ   n   n                  σ1 ..  .  σr Ir r  σˆr+1  Ir r = P ×   × H , (3.16) Uˆ  .  Vˆ  ..   σ   ˆn            where P is a permutation matrix along with unitary matrices Uˆ and Vˆ forming a singular value decomposition 76 3 Singularities of Matrix Rank Deficiency

µρr+1 µρn σˆr+1 σ r+1 ..    .  . H D = .. = Uˆ σˆ Vˆ . (3.17)    n   σ     n                Since σˆr+1 is the largest singular value of D, µ

ρr+1 ρr+1σr+1 .  .  σˆr+1 = max Dx D . = . > µ > θ n r 2  .  . x C − , x 2=1 ≥   ∈ ρ  ρ σ   n 2  n n        2   n r On the other hand, let y = (0, ,0,yn 1,yn)⊤ C − such that y = 1 and − ∈ 2 yn 1ρn 1 + ynρn = 0. Then − − 2 2 σˆn = min Dx Dy = (σn 1yn 1) + (σnyn) σn 1 < θ. (3.18) x =1 2 ≤ 2 − − ≤ − 2 Denote the columns of Vˆ by vˆr+1, ,vˆn. For any fixed j r + 2, ,n 1 , H ∈ − let z = (0, ,0,z j 1, ,zn)⊤ with z = 1 where vˆl z = 0 for l = j +1, ,n, − 2 and ρ jz j + + ρnzn = 0. Then H σˆ j = min Dx x 2 = 1, x vˆl = 0, l = j + 1, ,n 2 2 2 Dz = (z j 1σ j 1) + + (znσn) σ j 1 < θ (3.19) ≤ 2 − − ≤ − Thus rank θ (A1)= r + 1. Write

1 Ir r u˜ ,...,u˜ = P × 1 m U Uˆ Ir r v˜ ,...,v˜ = V × H . (3.20) 1 n Vˆ It is clear from (3.16) and (3.17) that

H H H H H A1 = σ1u˜ 1v˜ + + σru˜ rv˜ + σˆr 1u˜ r 1v˜ + σˆr 2u˜ r 2v˜ + + σˆnu˜ nv˜ 1 r + + r+1 + + r+2 n is a singular value expansion with order possibly shuffled. By (3.18) and (3.19), we have Ker θ (A1)= span v˜r+2,...,v˜n , which leads to Ker θ (A1) Ker θ (A) from matching the last n ({r + 1) columns} of both sides of (3.20). ⊂ − We now have span z1,v˜r+2,...,v˜n Ker θ (A). Since A1v˜ j 2 = σˆ j < θ for { H }⊂ j = r + 2,...,n, we have µz v˜ j σˆ j < θ and | 1 |≤ 3.4 A simple numerical rank-revealing algorithm 77

µ H µ H θ z1[v˜r+2,...,v˜n] = min z1[v˜r+2,...,v˜n]x < . 2 x =1 2 2 Thus, for any y = [ζ1,p ] with p = [ζr 2,...,ζn] = 0, ⊤ ⊤ + ⊤ 2 [z1,v˜r+2,...,v˜n]y 2 ζ H ζ = ( 1z1 + [v˜r+2,...,v˜n]p) ( 1z1 + [v˜r+2,...,v˜n]p) ζ 2 2 ζ H H H ζ = 1 + p 2 + ( 1z1) [v˜r+2,...,v˜n]p + p [v˜r+2,...,v˜n] z1 1 | | θ 2 2 2 2 > ζ1 + p ζ1 p 2 ( ζ1 p 2) 0 | | 2 − µ | | ≥ | |− ≥ since µ > θ. Therefore z1,v˜r 2,...,v˜n forms a basis for Ker θ (A), namely { + } Ker θ (A)= Ker θ (A1) Ran(Z1). ⊓⊔ The proof of Lemma 3.1 leads to the following corollary.

m n Corollary 3.3. Let A C with rank (A)= r. Assume z1,...,zk is an ∈ × { } orthonormal subset of Ker (A). Let Zk = [z1,...,zk], µ > 0 and

µZH A = k . k A

Then the positive singular value set of Ak is σ1(A),...,σr(A), µ . { } Proof. Since rank (A) = r, we have σr+1 = = σn = 0 in the proof of Lemma 3.1. Thus the equalities in (3.17) become

µρr+1 µρn µ 0 0     . . H D = .. = Uˆ .. Vˆ .      0   0                  The assertions of the corollary follows the rest of the proof of Lemma 3.1. ⊓⊔ For a roughly balanced magnitudes µZk 2 and A 2, the constant µ in (3.15) can be set as A ∞. After obtaining the first numerical kernel vector z1 within θ from the Algorithm NUMERICALKERNELVECTOR, we can apply the same algo- H rithm to obtain z2 with z z2 < θ/µ 1. Thus the Gram-Schmidt orthogonaliza- | 1 | ≪ tion of z1 and z2 is numerically stable. Assume an orthonormal set z1,...,zk of numerical kernel vectors of A within θ is obtained, the next numerical{ kernel} vector zk+1 can be computed using the Algorithm NUMERICALKERNELVECTOR on Ak. Again θ [z1,...,zk]zk 1 < 1 + 2 µ ≪ implies that zk+1 is almost orthogonal toz1,...,zk and the Gram-Schmidt process orthogonalizing z1,...,zk+1 can be carried out with healthy numerical stability. 78 3 Singularities of Matrix Rank Deficiency

Algorithm 6 (NUMERICALRANK) m n θ INPUT: upper triangular matrix R C , rank threshold ∈ × – set R0 = R and µ = R ∞. – for r = 0,...,n 1 do − · apply Algorithm NUMERICALKERNELVECTOR on Rr and θ · if Algorithm NUMERICALKERNELVECTOR outputs zr+1, then reset zr+1 so that z1,...,zr+1 is orthonormal else { } set the numerical rank rank θ (A)= n r − set the numerical kernel Ker θ (A)= span z1,...,zr break the do-loop { } end if µ H zr+1 · update the QR decomposition of to obtain Rr+1 Rr (c.f. [7, §12.5.3]) end do

OUTPUT: n r = rank θ (A) and [z1,...,zr] representing Ker θ (A). −

Fig. 3.3 Algorithm NUMERICALRANK

Detailed discussion of this algorithm can be found in [17]. A Matlab imple- mentation of Algorithm NUMERICALRANK is part of the package NACLAB. For matrices of large sizes, say 3200 1600 with numerical nullity 10, Algorithm × NUMERICALRANK can be more than 20 times faster than the SVD for computing the numerical rank and numerical kernel, as reported in [17].

3.5 Singular linear systems

3.5.1 The pseudo-inverse on a rank manifold

The pseudo-inverse of an m n matrix A is defined formally as the unique n m matrix A+ that satisfies the× four Moore-Penrose conditions: ×

AA+A = A, A+AA+ = A+, (A+A)H = A+A, (AA+)H = AA+. (3.21)

m n If A is on the rank manifold Cr × for 0 < r n, its singular value decomposition can be written as ≤ 3.5 Singular linear systems 79 σ 1 H . .. A = u1 ... ur ur 1 ... um   v1 ... vr vr 1 ... vn ,  +  σ  +   r   O          and it is straightforward to verify

+ 1 H 1 H A = vrur + + v1u1 σr σ1 1 σr .. H vr,..., v1  .  ur,..., u1 . ≡ 1 σ  1    satisfies the Moore-Penrose conditions (3.21). The most important theoretical ap- plication of pseudo-inverses is perhaps in solving the least squares problem as as- serted in the following lemma, although it is almost never necessary to construct a pseudo-inverse in practice. m n + Lemma 3.2. Let A C × . Then x = A b is the least squares solution to ∈ ∗ n Ax = b with the minimum norm, namely Ax b 2 = minx C Ax b 2 with ∗ − ∈ −

x = min y Ay b = min Az b . 2 2 2 n 2 ∗ − z C − ∈ Proof. The vector x = A+b is a least squares solution to Ax = b because AH(b A+b)= 0. It is∗ of minimum norm among all least squares solution because − x Ker(A)⊥. ∗ ∈ Using the least squares implication of the pseudo-inverse in Lemma 3.2, we have the following corollary. m n Corollary 3.4. Letthe m n matrix A Cr × and b Ran(A). Then, for every µ > 0, × ∈ ∈ µNH + 0 A+b = , (3.22) A b where N is the matrix whose columns span Ker (A). Moreover, A+b is unique for every b Ran(A). ∈ + Proof. The equality (3.22) follows from A b Ker (A)⊥. It is unique because the µ H ∈ matrix N is of full rank. A ⊓⊔

3.5.2 Singular linear systems are not singular on rank manifolds

Consider a consistent m n linear system in the form of × 80 3 Singularities of Matrix Rank Deficiency

Ax = b (3.23) for a rank-r matrix A and b Ran(A). Assuming columns of N form a basis for ∈ + n r Ker (A), its generalsolutions form an affine space x0 +Ny x0 = A b, y C − containing infinite many solution vectors. However, the sol| ution affine space∈ that can be represented by

+ x0,Ker (A) := x = x0 + y x0 = A b, y Ker(A) (3.24) ∈

n n C is unique as a pair in C vector spaces in in which a distance can be naturally defined as ×{ }

dist ((x,P),(y,Q)) = x y 2 + dist (P,Q )2. (3.25) − 2 Solving the system (3.23) is ill-posed since almost all perturbations on the data A and/or b will make the system inconsistent so that the solution doesn’t ex- ∆ ∆ ∆ C m n ist. If the perturbation ( A, b) is restricted so that A + A r × and b+∆b Ran(A+∆A), the solution of (A+∆A)x = b+∆b as an affine∈ space not only exists∈ uniquely, but also Lipschitz continuous. For any rank r, the set

m n m n M × := (A,b) A C × , b Ran(A) r ∈ r ∈

m n m C is a complex manifold in C with codimension × × m n codim M × = (m r)(n r + 1). r − − The proof of this fact should be a good exercise. The ill-posedness of solving Ax = b can thus be explained as usual: Almost M m n all perturbations on (A,b) push it away from its native manifold r × and alter solvability and the solution completely. A usual window is open as well: If M m n the perturbation on the pair (A,b) stays on the manifold r × , the solution (x0,Ker(A)) as a pair (3.24) uniquely exists and is Lipschitz continuous. Theorem 3.3 (Linear System Sensitivity Theorem on a Rank Manifold). Let A C m n be a matrix on the rank manifold r × and b Ran(A). If the perturbations ∆ ∆ ∆ m n∈ ∆ ∆ A and b are constrained by A + A Cr × and b + b Ran(A + A), then solving the linear system Ax = b is a∈ well-posed problem with∈ the condition number κθ (A) < ∞ for any 0 < θ < σr(A).

Proof. The solution (x0,Ker(A)) as in (3.24) uniquely exists since Ker (A) is + ˜ ˜ M m n unique and so is A b. To prove the Lischitz continuity, let (A,b), (A,b) r × , + + ∈ x0 = A b, x˜0 = A˜ b˜, and columnes of N and N˜ form orthonormal bases for Ker (A) and Ker (A˜) respectively. By Lemma 2.2 (page 25), we can further assume N N˜ 2 dist Ker(A),Ker (A˜) . For η = σr(A) with A A˜ 2 < η, denote − ≤ − ηNH ηN˜ H Aη = and A˜η = . A A˜ 3.6 Stable solutions of highly ill-conditioned linear systems 81

Then, from the Singular Subspace Continuity Theorem (page 26),

2 2 2 Aη A˜η 2 A A˜ + η N N˜ ζ A A˜ 2 − ≤ − 2 − 2 ≤ − with 2n ζ 1 + ˜ σ 2 ≤ 1 A A 2/ r(A) − − of moderate size. By Lemma 3.2 and the Least Squares Sensitivity Theorem (page 32),

+ + A b A˜ b˜ 2 κ(Aη ) 2 x0 2 Aη A˜η 2 + b b˜ 2 limsup − − − Aη (A˜,b˜) (A,b) A A˜ 2 + b b˜ 2 ≤ 2 A A˜ 2 + b b˜ 2 ˜ ˜ →M m n 2 2 2 2 (A,b) r × − − − − ∈ σ1(A) 2√1 + 2n x0 + 1 . ≤ σr(A) A 2 Combined with the inequality (2.31) (page 26), we have

dist (A+b,Ker (A)),(A˜+b˜,Ker (A˜)) limsup (A˜,b˜) (A,b) A A˜ 2 + b b˜ 2 ˜ ˜ →M m n 2 2 (A,b) r × − − ∈ 2 σ (A) 2√1 + 2n x0 + 1 + 2n 1 . ≤ σr(A) A 2 ⊓⊔ The conventional condition number κ(A)= ∞ when the matrix A is rank deficient, and solving the linear system Ax = b is thus ill-posed/singular in general. On its rank manifold, however, the sensitivity of the general solution (A+b,Ker(A)) σ (A) is bounded by κθ (A)= 1 < ∞ forany 0 < θ < σ (A). σr(A) r

3.6 Stable solutions of highly ill-conditioned linear systems

Consider the linear system Ax = b with high condition number 1 κ(A) < ∞. Its solution is subject to the error bound (2.6) (page 18) under the≪ perturbation 1 (∆A,∆b). Namely the conventional solution x = A− b is highly sensitive in such a way that the relative data error can be magnified by the factor κ(A) into the relative error of the solution. If we ask: Can an ill-conditioned linear system have a well-conditioned numeri- cal solution? The question may appear to be ridiculous. The answer, however, lies in the very notion of the numerical solution, which doesn’t have to be an approxi- 1 mation to the meaningless A− b when κ(A) is huge. 82 3 Singularities of Matrix Rank Deficiency 3.6.1 A possibly well-conditioned numerical solution

The first and foremost criterion for a numerical solution is backward accuracy. The real question becomes the following: Amoung all the possible backward accurate numerical solutions, is there a stable solution that is insensitive to data perturba- tions? The answer is positive when the magnitude of θ is an acceptable backward accuracy and κθ (A) is an acceptable sensitivity measure. When the matrix A is highly ill-conditioned, it is near a rank-deficient matrix, or equivalently, it is deficient in its numerical rank. Let r = rank θ (A) be the numerical rank of A within certain θ and Aθ be the rank-r projection of A. Then the highly ill-conditioned linear system Ax = b is near a consistent linearly system Aθ x = bθ if bθ is the projection of b to Ran(Aθ )= Ranθ (A) and b bθ 2 1. + − ≪ In such cases, the general solution Aθ b,Ker θ (A) of Aθ x = bθ is not only a backward accurate solution of Ax = b, but also possibly well-conditioned since κθ (A) can be moderate even though κ(A) is not.

m n Corollary 3.5. Let θ > 0 and A C with rank θ (A)= r. Denote the rank-r ∈ × projection of A as Aθ and the projection of b to Ranθ (A) as bθ . Assume + b bθ 2 < θ. Then Aθ b, Ker θ (A) is a numerical solution of the linear system Ax−= b with a backward accuracy

max A Aθ 2, b bθ 2 < θ − − σ (A) and a condition number κθ (A)= 1 . σr(A) The following is an example of the stable numerical solution in the application of polynomial division.

Example 3.5. The polynomial division problem of f (x) (10 + x) for ÷ 2 40 f (x) = x14 x13 + x9 + 20x8 x4 20x3 + x + 26 −3 − 3 − − 2 = (x + 20) x13 + x8 x3 + 1 + 6 −3 − with quotient and remainder 2 q(x) = x13 + x8 x3 + 1, r(x) = 6 − 3 − is equivalent to solving the linear system

1 20 1  .  q 20 .. = f.   r  .   ..   1     20 1    3.6 Stable solutions of highly ill-conditioned linear systems 83 where f and q are coefficient vectors of f and q respectively. Denoting the matrix as A, this linear system is highly ill-conditioned with the condition number

κ(A) > 1018.

With such a high sensitivity, conventional polynomial long division is highly unre- liable and predictably yields wrong quotient/remainder

13 q˜(x)= 0.666666666666667 x + 1.091393642127514 x + 22.827872842550278, − − r˜ = 430.5574568510056 − as obtained by the function deconv in Matlab. 13 However, for any θ (1.0 10− ,9.0), the numerical nullity nullityθ (A)= 1 and the condition number∈ w.r.t×θ

κθ (A) 1.1028 ≈ which is as well conditionedas one can hope for. As a result, the numerical solution

0.666666666666667 0.000000000000000 −0.000000000000000 0.000000000000000  −0.000000000000001   −0.000000000000000  −  0.000000000000029   0.000000000000005      −    0.000000000000580   0.000000000000098   −      1.000000000011593   0.000000000001951       −     0.000000000231840   0.000000000039014  

+  −     C Aθ f, Ker θ (A) =   0.000000004636811  +t  0.000000000780273  t  (3.26)     −     0.000000092736231   0.000000015605457     −    ∈    0.000001854724615   0.000000312109131       −    1.000037094492285   0.000006242182611   −     0.000741889845703   0.000124843652221     −   0.014837796914057   0.002496873044430    −       1.296755938281153   0.049937460888595       −     0.064881234376948   0.998749217771909           is highly accurate with backward error 

14 max A Aθ 2, f fθ 2 9.98 10− < θ, − − ≤ × and extremely stable with condition number 1.22. One counter argument may be that the polynomial division problem still not + solved even though a stable numerical solution Aθ f, Ker θ (A) is found. The quotient and remainder are not yet obtained. The real reason is that the problem is in fact under-determined. The true quotient and remainder can be accurately ap- + proximated as a member of the numerical solution set Aθ f, Ker θ (A) by setting t = 5.942551603558297 in (3.26): 13 8 3 qˆ(x)= 0.666666666666667 x +1.000000000000001x 1.000000000000000x +1.000000000000000 − − rˆ(x)= 6.000000000000000

15 with a tiny coefficient-wise error 10− . Obtaining this particular solution requires an additional equation translated from an additional piece of information. ⊓⊔ 84 3 Singularities of Matrix Rank Deficiency 3.6.2 Computing the stable numerical solution

For a highly ill-conditioned linear system Ax = b, the numerical solution set + Aθ b, Ker θ (A) can be computed in conjunction with the Algorithm NUMERI- CALRANK, which produces the unitary matrix N whose columns span Ker θ (A) as µNH well as the QR decomposition of the stacked matrix . The following lemma A + states that the minimum norm solution Aθ b can be solved using those by-products of Algorithm NUMERICALRANK.

m n

Lemma 3.3. Let A C with its singular value decomposition partitioned as ∈ ×

Σ1 H A = U1 U2 V1 V2 (3.27) Σ2 n (n r) where Σ1 is r r. Assume columns of N C form an orthornomal basis × ∈ × − for Ran(V2) and x is the least squares solution of ∗ µNH 0 x = (3.28) A b with µ > 0. Then H Σ H + x N N x = (U1 1V1 ) b (3.29) ∗ − ∗ Proof. The least squares solution x satisfy the normal equation ∗ µ2NNHx + AHAx AHb = 0 ∗ ∗ − of (3.28). Namely, using the partition (3.27), we have an orthogonal decomposition

Σ HΣ H Σ H H (V1 1 1V1 x V1 1 U1 b) + ∗ − Σ HΣ H Σ H H µ2 H (V2 2 2V2 x V2 2U2 b + NN x ) = 0, ∗ − ∗ Σ HΣ H Σ H H implying V1 1 1V1 x V1 1 U1 b = 0 and thus x is a least squares solution ∗ −Σ H ∗ H H of the linear system (U1 1V1 )x = b. Write x = V1V1 x +V2V2 x . It is easy to H Σ H + H∗ ∗ ∗ verify that x0 = V1V1 x = (U1 1V1 ) b = NN x . ∗ ∗ ⊓⊔ Choosing r = rank θ (A) in Lemma 3.3, the equation (3.29) becomes

H + x NN x = Aθ b ∗ − ∗ as part of the numerical solution of Ax = b. Using the available QR decomposition

µNH R = Q , Q , A 1 2 O The vector x is the solution of ∗ 3.6 Stable solutions of highly ill-conditioned linear systems 85

H Rx = Q1b requiring only a backward substitution.

3.6.3 On Tikhonov’s regularization method

For the problem of solving a highly ill-conditionedlinear system Ax = b, Tikhonov’s strategy is to regularize the problem through solving the minimization problem

2 2 minimize Ax b + λ Lx 2 (3.30) − 2 with proper choice of matrix L and regularization parameter λ. It is easy to see that the minimization problem (3.30) is equivalent to the solving the least squares problem of the linear system

λL 0 x = (3.31) A b If we choose L in such a way that the columns of LH span the numerical kernel Ker θ (A), the Tikhonov solution x of (3.30) can be written as ∗ + x = Aθ b + Nu ∗ for certain vector u, based on the analysis in 3.6.2. In that case, the Tikhonov § + solution x is one of the vector in the solution set Aθ b, Ker θ (A) . Tikhonov’s∗ regularization method is instrumental in solving many application problems (see, e.g. [10]) and it is the standard method for solving ill-posed linear problems in infinite dimensional spaces. From our analysis in this chapter, there can be infinitely many numerical solu- tions a highly ill-conditioned linear system Ax = b if a numerical solution exists at all. Solving the Tikhonov’s regularized problem (3.30) for each pair λ and L yields only one particular solution. In this sense, the Tikhonov regularization may be considered incomplete in solving the ill-conditioned linear system Ax = b.