L. Vandenberghe ECE133B (Spring 2020) 4. Singular value decomposition
• singular value decomposition
• related eigendecompositions
• matrix properties from singular value decomposition
• min–max and max–min characterizations
• low-rank approximation
• sensitivity of linear equations
4.1 Singular value decomposition (SVD) every m × n matrix A can be factored as
A = UΣVT
• U is m × m and orthogonal • V is n × n and orthogonal
• Σ is m × n and “diagonal”: diagonal with diagonal elements σ1,..., σn if m = n,
σ1 0 ··· 0 0 σ2 ··· 0 . . . . σ1 0 ··· 0 0 ··· 0 . . . . . 0 σ2 ··· 0 0 ··· 0 Σ = 0 0 ··· σn if m > n, Σ = ...... if m < n ...... 0 0 ··· 0 . . . 0 0 ··· σm 0 ··· 0 . . . 0 0 ··· 0
• the diagonal entries of Σ are nonnegative and sorted:
σ1 ≥ σ2 ≥ · · · ≥ σmin{m,n} ≥ 0
Singular value decomposition 4.2 Singular values and singular vectors
A = UΣVT
• the numbers σ1,..., σmin{m,n} are the singular values of A
• the m columns ui of U are the left singular vectors
• the n columns vi of V are the right singular vectors if we write the factorization A = UΣVT as
AV = UΣ, ATU = VΣT and compare the ith columns on the left- and right-hand sides, we see that
T Avi = σiui and A ui = σivi for i = 1,...,min{m, n}
T • if m > n the additional m − n vectors ui satisfy A ui = 0 for i = n + 1,..., m
• if n > m the additional n − m vectors vi satisfy Avi = 0 for i = m + 1,..., n
Singular value decomposition 4.3 Reduced SVD often m n or n m, which makes one of the orthogonal matrices very large
Tall matrix: if m > n, the last m − n columns of U can be omitted to define
A = UΣVT
• U is m × n with orthonormal columns • V is n × n and orthogonal
• Σ is n × n and diagonal with diagonal entries σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0
Wide matrix: if m < n, the last n − m columns of V can be omitted to define
A = UΣVT
• U is m × m and orthogonal • V is m × n with orthonormal columns
• Σ is m × m and diagonal with diagonal entries σ1 ≥ σ2 ≥ · · · ≥ σm ≥ 0 we refer to these as reduced SVDs (and to the factorization on p. 4.2 as a full SVD)
Singular value decomposition 4.4 Outline
• singular value decomposition
• related eigendecompositions
• matrix properties from singular value decomposition
• min–max and max–min characterizations
• low-rank approximation
• sensitivity of linear equations Eigendecomposition of Gram matrix suppose A is an m × n matrix with full SVD
A = UΣVT the SVD is related to the eigendecomposition of the Gram matrix AT A:
AT A = VΣT ΣVT
• V is an orthogonal n × n matrix
• ΣT Σ is a diagonal n × n matrix with (non-increasing) diagonal elements
2 2 2 σ1, σ2, . . . , σmin{m,n}, 0, 0, ··· , 0 | {z } n − min{m, n} times
T T • the n diagonal elements of Σ Σ are the eigenvalues of A A • the right singular vectors (columns of V) are corresponding eigenvectors
Singular value decomposition 4.5 Gram matrix of transpose the SVD also gives the eigendecomposition of AAT:
AAT = UΣΣTUT
• U is an orthogonal m × m matrix
• ΣΣT is a diagonal m × m matrix with (non-increasing) diagonal elements
2 2 2 σ1, σ2, . . . , σmin{m,n}, 0, 0, ··· , 0 | {z } m − min{m, n} times
T T • the m diagonal elements of ΣΣ are the eigenvalues of AA • the left singular vectors (columns of U) are corresponding eigenvectors in particular, the first min{m, n} eigenvalues of AT A and AAT are the same:
2 2 2 σ1, σ2, . . . , σmin{m,n}
Singular value decomposition 4.6 Example scatter plot shows m = 500 points from the normal distribution on page 3.34 √ 5 1 7 3 µ = , Σ = √ 4 ex 4 3 5
• we define an m × 2 data matrix X with the m vectors as its rows T • the centered data matrix is Xc = X − (1/m)11 X
x2 sample estimate of mean is
1 5.01 µ = XT1 = b m 3.93 sample estimate of covariance is
1 1.67 0.48 Σ = XT X = b m c c 0.48 1.35
x1
Singular value decomposition 4.7 Example 1 A = √ Xc m
• eigenvectors of bΣ are right singular vectors v1, v2 of A (and of Xc) • eigenvalues of bΣ are squares of the singular values of A
x2
direction v1 direction v2
x1
Singular value decomposition 4.8 Existence of singular value decomposition the Gram matrix connection gives a proof that every matrix has an SVD
• assume A is m × n with m ≥ n and rank r • the n × n matrix AT A has rank r (page 2.5) and an eigendecomposition
AT A = VΛVT (1)
Λ is diagonal with diagonal elements λ1 ≥ · · · ≥ λr > 0 = λr+1 = ··· = λn √ • define σi = λi for i = 1,..., n, and an n × n matrix h i U = u ··· u = 1 Av 1 Av ··· 1 Av u ··· u 1 n σ1 1 σ2 2 σr r r+1 n
T where ur+1,..., un form any orthonormal basis for null(A )
• (1) and the choice of ur+1,..., un imply that U is orthogonal
• (1) also implies that Avi = 0 for i = r + 1,..., n
• together with the definition of u1,. . . , ur this shows that AV = UΣ
Singular value decomposition 4.9 Non-uniqueness of singular value decomposition the derivation from the eigendecomposition
AT A = VΛVT shows that the singular value decomposition of A is almost unique
Singular values • the singular values of A are uniquely defined • we have also shown that A and AT have the same singular values
Singular vectors (assuming m ≥ n): see the discussion on page 3.14
• right singular vectors vi with the same positive singular value span a subspace • in this subspace, any other orthonormal basis can be chosen
• the first r = rank(A) left singular vectors then follow from σiui = Avi
• the remaining vectors vr+1,..., vn can be any orthonormal basis for null(A) T • the remaining vectors ur+1,..., um can be any orthonormal basis for null(A )
Singular value decomposition 4.10 Exercise suppose A is an m × n matrix with m ≥ n, and define
0 A B = AT 0
1. suppose A = UΣVT is a full SVD of A; verify that
U 0 0 Σ U 0 T B = 0 V ΣT 0 0 V
2. derive from this an eigendecomposition of B
Hint: if Σ1 is square, then
0 Σ 1 II Σ 0 II 1 = 1 Σ1 0 2 I −I 0 −Σ1 I −I
3. what are the m + n eigenvalues of B?
Singular value decomposition 4.11 Outline
• singular value decomposition
• related eigendecompositions
• matrix properties from singular value decomposition
• min–max and max–min characterizations
• low-rank approximation
• sensitivity of linear equations Rank the number of positive singular values is the rank of a matrix
• suppose there are r positive singular values:
σ1 ≥ · · · ≥ σr > 0 = σr+1 = ··· = σmin{m,n}
• partition the matrices in a full SVD of A as Σ 0 T A = U U 1 V V 1 2 0 0 1 2 T = U1Σ1V1 (2)
Σ1 is r × r with the positive singular values σ1,..., σr on the diagonal
• since U1 and V1 have orthonormal columns, the factorization (2) proves that
rank(A) = r
(see page 1.12)
Singular value decomposition 4.12 Inverse and pseudo-inverse we use the same notation as on the previous page Σ 0 T A = U U 1 V V = U Σ VT 1 2 0 0 1 2 1 1 1 diagonal entries of Σ1 are the positive singular values of A
• pseudo-inverse follows from page 1.36:
† −1 T A = V1Σ1 U1 Σ−1 T 1 0 U1 = V1 V2 T 0 0 U2 = VΣ†UT
• if A is square and nonsingular, this reduces to the inverse
A−1 = (UΣVT)−1 = VΣ−1UT
Singular value decomposition 4.13 Four subspaces we continue with the same notation for the SVD of an m × n matrix A with rank r: Σ 0 T A = U U 1 V V 1 2 0 0 1 2 the diagonal entries of Σ1 are the positive singular values of A the SVD provides orthonormal bases for the four subspaces associated with A
• the columns of the m × r matrix U1 are a basis of range(A) ⊥ T • the columns of the m × (m − r) matrix U2 are a basis of range(A) = null(A ) T • the columns of the n × r matrix V1 are a basis of range(A )
• the columns of the n × (n − r) matrix V2 are a basis of null(A)
Singular value decomposition 4.14 Image of unit ball define Ey as the image of the unit ball under the linear mapping y = Ax:
T Ey = {Ax | kxk ≤ 1} = {UΣV x | kxk ≤ 1}
x2 x˜2 e v2 x˜ = VT x 2
e1 x1 x˜1 v1
y = Ax y˜ = Σx˜ y2 y˜2 σ u σ2e2 2 2 σ1u1
σ1e1 y1 y˜1 Ey y = Uy˜
Singular value decomposition 4.15 Control interpretation system A maps input x to output y = Ax
x A y = Ax
• if kxk2 represents input energy, the set of outputs realizable with unit energy is
Ey = {Ax | kxk ≤ 1}
• assume rank(A) = m: every desired y can be realized by at least one input • the most energy-efficient input that generates output y is
m (uT y)2 x = A†y = AT(AAT)−1y, kx k2 = yT(AAT)−1y = X i eff eff 2 i=1 σi
T T −1 • (if rank(A) = m) the set Ey is an ellipsoid Ey = {y | y (AA ) y ≤ 1}
Singular value decomposition 4.16 Inverse image of unit ball define Ex as the inverse image of the unit ball under the linear mapping y = Ax:
T Ex = {x | kAxk ≤ 1} = {x | kUΣV xk ≤ 1}
x2 x˜2 −1 −1 σ e2 σ2 v2 2 x˜ = VT x
Ex −1 σ1 e1 x1 x˜1 −1 σ1 v1
y = Ax y˜ = Σx˜ y2 y˜2 e u2 2 u1 e1 y1 y˜1 y = Uy˜
Singular value decomposition 4.17 Estimation interpretation
measurement A maps unknown quantity xtrue to observation
yobs = A(xtrue + v) where v is unknown but bounded by kAvk ≤ 1
• if rank(A) = n, there is a unique estimate xˆ that satisfies Axˆ = yobs
• uncertainty in y causes uncertainty in estimate: true value xtrue must satisfy
kA(xtrue − xˆ)k ≤ 1
• the set Ex = {x | kA(x − xˆ)k ≤ 1} is the uncertainty region around estimate xˆ
Singular value decomposition 4.18 Frobenius norm and 2-norm
for an m × n matrix A with singular values σi:
/ min{m,n} ! 1 2 X 2 kAkF = σi , kAk2 = σ1 i=1 this readily follows from the unitary invariance of the two norms:
/ min{m,n} ! 1 2 T X 2 kAkF = kUΣV kF = kΣkF = σi i=1 and T kAk2 = kUΣV k2 = kΣk2 = σ1
Singular value decomposition 4.19 Exercise
† † Exercise 1: express kA k2 and kA kF in terms of the singular values of A
Exercise 2: the condition number of a square nonsingular matrix A is defined as
−1 κ(A) = kAk2kA k2 express κ(A) in terms of the singular values of A
Exercise 3: give an SVD and the 2-norm of the matrix
A = abT where a is an n-vector and b is an m-vector
Singular value decomposition 4.20 Outline
• singular value decomposition
• related eigendecompositions
• matrix properties from singular value decomposition
• min–max and max–min characterizations
• low-rank approximation
• sensitivity of linear equations First singular value the first singular value is the maximal value of several functions:
T T σ1 = max kAxk = max y Ax = max kA yk (3) kxk=1 kxk=kyk=1 kyk=1
• the first and last expressions follow from page 3.24 and
2 T T T 2 T T T σ1 = λmax(A A) = max x A Ax, σ1 = λmax(AA ) = max y AA y kxk=1 kyk=1
• second expression in (3) follows from the Cauchy–Schwarz inequality:
kAxk = max yT(Ax), kAT yk = max xT(AT y) kyk=1 kxk=1
Singular value decomposition 4.21 First singular value alternatively, we can use an SVD of A to solve the maximization problems in
T T σ1 = max kAxk = max y Ax = max kA yk (4) kxk=1 kxk=kyk=1 kyk=1
• suppose A = USVT is a full SVD of A
• if we define x˜ = VT x, y˜ = UT y, then (4) can be written as
T T σ1 = max kΣx˜k = max y˜ Σx˜ = max kΣ y˜k kx˜k=1 kx˜k=ky˜k=1 ky˜k=1
• an optimal choice for x˜ and y˜ is x˜ = (1,0,...,0) and y˜ = (1,0,...,0)
• therefore x = v1 and y = u1 are optimal for each of the maximizations in (4)
Singular value decomposition 4.22 Last singular value two of the three expressions in (3) have a counterpart for the last singular value
• for an m × n matrix A, the last singular σmin{m,n} can be written as follows:
T if m ≥ n: σn = min kAxk, if n ≥ m: σm = min kA yk (5) kxk=1 kyk=1
• if m , n, we need to distinguish the two cases because
min kAxk = 0 if n > m, min kAT yk = 0 if m > n kxk=1 kyk=1 to prove (5), we substitute full SVD A = UΣVT, and define x˜ = VT x, y˜ = UT y:
1/2 2 2 2 2 if m ≥ n: min kΣx˜k = min σ x˜ + ··· + σn x˜n = σn kx˜k=1 kx˜k=1 1 1 1/2 T 2 2 2 2 if n ≥ m: min kΣ y˜k = min σ y˜ + ··· + σmy˜m = σm ky˜k=1 ky˜k=1 1 1 optimal choices for x and y in (5) are x = vn, y = um
Singular value decomposition 4.23 Max–min characterization we extend (3) to a max–min characterization of the other singular values:
σk = max σmin(AX) (6a) T X X=Ik T = max σmin(Y AX) (6b) T T X X=Y Y=Ik T = max σmin(A Y) (6c) T Y Y=Ik
• σk for k = 1,...,min{m, n} are the singular values of the m × n matrix A • X is n × k with orthonormal columns, Y is m × k with orthonormal columns
• σmin(B) denotes the smallest singular value of the matrix B
• in the three expressions in (6) σmin(·) denotes the kth singular value
• for k = 1, we obtain the three expressions for σ1 in (3)
• these can be derived from the min–max theorems for eigenvalues (p. 3.29) • or we can find an optimal choice for X, Y from an SVD of A (we skip the details)
Singular value decomposition 4.24 Min–max characterization we extend (5) to a min–max characterization of all singular values
Tall or square matrix: if A is m × n with m ≥ n
σn−k+1 = min kAXk2, k = 1,..., n (7) T X X=Ik
• we minimize over n × k matrices X with orthonormal columns
• kAXk2 is the maximum singular value of an m × k matrix • for k = 1, this is the first expression in (5)
Wide or square matrix (A is m × n with m ≤ n)
T σm−k+1 = min kA Y k2, k = 1,..., m T Y Y=Ik
• we minimize over n × k matrices Y with orthonormal columns T • kA Y k2 is the maximum singular value of an n × k matrix • for k = 1, this is the second expression in (5)
Singular value decomposition 4.25 Proof of min–max characterization (for m ≥ n) we use a full SVD A = UΣVT to solve the optimization problem
minimize kAXk2 subject to XT X = I changing variables to X˜ = VT X gives the equivalent problem
minimize kΣX˜ k2 subject to X˜ T X˜ = I
• we show that the optimal value is σn−k+1 • an optimal solution for X˜ is formed from the last k columns of the n × n identity X˜opt = en−k+1 ··· en−1 en
• an optimal solution for X is formed from the last k columns of V: Xopt = vn−k+1 ··· vn−1 vn
Singular value decomposition 4.26 Proof of min–max characterization (for m ≥ n)
• we first note that ΣX˜opt are the last k columns of Σ, so kΣX˜optk2 = σn−k+1
• to show that this is optimal, consider any other n × k matrix X˜ with X˜ T X˜ = I
• find a nonzero k-vector u for which y = Xu˜ has last k − 1 components zero:
Xu˜ = (y1,..., yn−k+1,0,...,0)
this is possible because a (k − 1) × k matrix has linearly dependent columns
• normalize u so that kuk = kyk = 1, and use it to lower bound kΣX˜ k2:
kΣX˜ k2 ≥ kΣXu˜ k2
= kΣyk2 1/2 2 2 2 2 = σ1 y1 + ··· + σn−k+1yn−k+1 1/2 2 2 ≥ σn−k+1 y1 + ··· + yn−k+1 = σn−k+1
Singular value decomposition 4.27 Outline
• singular value decomposition
• related eigendecompositions
• matrix properties from singular value decomposition
• min–max and max–min characterizations
• low-rank approximation
• sensitivity of linear equations Rank-r approximation let A be an m × n matrix with rank(A) > r and full SVD
min{m,n} T X T A = UΣV = σiuivi , σ1 ≥ · · · ≥ σmin{m,n} ≥ 0, σr+1 > 0 i=1 the best rank-r approximation of A is the sum of the first r terms in the SVD:
r X T B = σiuivi i=1
• B is the best approximation for the Frobenius norm: for every C with rank r,
/ min{m,n} ! 1 2 X 2 kA − CkF ≥ kA − BkF = σi i=r+1
• B is also the best approximation for the 2-norm: for every C with rank r,
kA − Ck2 ≥ kA − Bk2 = σr+1
Singular value decomposition 4.28 Rank-r approximation in Frobenius norm the approximation problem in k · kF is a nonlinear least squares problem:
m n r ! 2 T 2 X X X minimize kA − YX kF = Aij − Yik Xjk (8) i=1 j=1 k=1
• matrix B is written as B = YXT with Y of size m × r and X of size n × r • we optimize over X and Y
Outline of the solution
• the first order (necessary but not sufficient) optimality conditions are
AX = Y(XT X), ATY = X(YTY)
• can assume XT X = YTY = D with D diagonal (e.g., get X, Y from SVD of B) • then columns of X, Y are formed from r pairs of right/left singular vectors of A • optimal choice for (8) is to take the first r singular vector pairs
Singular value decomposition 4.29 Rank-r approximation in 2-norm to show that B is the best approximation in 2-norm, we prove that
kA − Ck2 ≥ kA − Bk2 for all C with rank(C) = r on the right-hand side
min{m,n} r X T X T A − B = σiuivi − σiuivi i=1 i=1 min{m,n} X T = σiuivi i=r+1
σr+1 ··· 0 . . . T = ur+1 ··· umin{m,n} . . . . vr+1 ··· vmin{m,n} 0 ··· σmin{m,n} and the 2-norm of the difference is kA − Bk2 = σr+1 on the next page we show that kA − Ck2 ≥ σr+1 if C has rank r
Singular value decomposition 4.30 Proof: we prove that kA − Ck2 ≥ σr+1 for all m × n matrices C of rank r
• we assume that m ≥ n (otherwise, first take the transpose of A and C)
• if rank(C) = r, the nullspace of C has dimension n − r
• define an n × (n − r) matrix Xˆ with orthonormal columns that span null(C)
• use the min–max characterization on page 4.25 to bound kA − Ck2:
kA − Ck2 = max k(A − C)xk kxk=1 ≥ max k(A − C)Xˆ wk (kXˆ wk = 1 if kwk = 1) kwk=1
= k(A − C)Xˆ k2
= kAXˆ k2 (CXˆ = 0)
≥ min kAXk2 T X X=In−r = σr+1 (apply (7) with k = n − r)
Singular value decomposition 4.31 Outline
• singular value decomposition
• related eigendecompositions
• matrix properties from singular value decomposition
• min–max and max–min characterizations
• low-rank approximation
• sensitivity of linear equations SVD of square matrix for the rest of the lecture we assume that A is n × n and nonsingular with SVD
n T X T A = UΣV = σiuivi i=1
• 2-norm of A is kAk2 = σ1
• A is nonsingular if and only if σn > 0 • inverse of A and 2-norm of A−1 are
n −1 −1 T X 1 T −1 1 A = VΣ U = viui , kA k2 = i=1 σi σn
• condition number of A is
−1 σ1 κ(A) = kAk2kA k2 = ≥ 1 σn
A is called ill-conditioned if the condition number is very high
Singular value decomposition 4.32 Sensitivity to right-hand side perturbations linear equation with right-hand side b , 0 and perturbed right-hand side b + e:
Ax = b, Ay = b + e
• bound on distance between the solutions:
−1 −1 ky − xk = kA ek ≤ kA k2kek
recall that kBxk ≤ kBk2kxk for matrix 2-norm and Euclidean vector norm
• bound on relative change in the solution, in terms of δb = kek/kbk:
ky − xk kek ≤ kAk kA−1k = κ(A) δ kxk 2 2 kbk b
in the first step we use kbk = kAxk ≤ kAk2kxk large κ(A) indicates that the solution can be very sensitive to changes in b
Singular value decomposition 4.33 Worst-case perturbation of right-hand side
ky − xk kek ≤ κ(A) δ where δ = kxk b b kbk
• the upper bound is often very conservative • however, for every A one can find b, e for which the bound holds with equality
• choose b = u1 (first left singular vector of A): solution of Ax = b is
−1 −1 T 1 x = A b = VΣ U u1 = v1 σ1
• choose e = δb un (δb times left singular vector un): solution of Ay = b + e is
−1 δb y = A (b + e) = x + vn σn
• relative change is ky − xk σ1δb = = κ(A) δb kxk σn
Singular value decomposition 4.34 Nearest singular matrix the singular matrix closest to A is
n−1 X T T σiuivi = A + E where E = −σnunvn i=1
• this gives another interpretation of the condition number:
1 kE k σ 1 kE k = σ = , 2 = n = 2 n −1 kA k2 kAk2 σ1 κ(A)
1/κ(A) is the relative distance of A to the nearest singular matrix
• this also implies that a perturbation A + E of A is certainly nonsingular if
1 kE k < = σ 2 −1 n kA k2
Singular value decomposition 4.35 Bound on inverse on the next page we prove the following inequality:
−1 kA k2 1 k(A + E)−1k ≤ if kE k < (9) 2 −1 2 −1 1 − kA k2kE k2 kA k2
−1 1 using kA k2 = 1/σn: σn − kE k2
−1 1 k(A + E) k2 ≤ σn − kE k2 if kE k2 < σn
1/σn σn kE k2
Singular value decomposition 4.36 Proof:
• the matrix Y = (A + E)−1 satisfies
(I + A−1E)Y = A−1(A + E)Y = A−1
• therefore
−1 −1 kY k2 = kA − A EY k2 −1 −1 ≤ kA k2 + kA EY k2 (triangle inequality) −1 −1 ≤ kA k2 + kA E k2kY k2
in the last step we use the property kCDk2 ≤ kCk2kDk2 of the matrix 2-norm
• rearranging the last inquality for kY k2 gives
kA−1k kA−1k kY k ≤ 2 ≤ 2 2 −1 −1 1 − kA E k2 1 − kA k2kE k2
−1 −1 in the second step we again use the property kA E k2 ≤ kA k2kE k2
Singular value decomposition 4.37 Sensitivity to perturbations of coefficient matrix linear equation with matrix A and perturbed matrix A + E:
Ax = b, (A + E)y = b
−1 • we assume kE k2 < 1/kA k2, which guarantees that A + E is nonsingular • bound on distance between the solutions:
ky − xk = k(A + E)−1(b − (A + E)x)k = k(A + E)−1E xk −1 ≤ k(A + E) k2 kE k2 kxk −1 kA k2kE k2 ≤ kxk (applying (9)) −1 1 − kA k2kE k2
• bound on relative change in solution in terms of δA = kE k2/kAk2:
ky − xk κ(A) δA ≤ (10) kxk 1 − κ(A)δA
Singular value decomposition 4.38 Worst-case perturbation of coefficient matrix Pn T an example where the upper bound (10) is sharp (from SVD A = i=1 σiuivi )
• choose b = un: the solution of Ax = b is
−1 x = A b = (1/σn)vn
T • choose E = −δAσ1unvn with δA < σn/σ1 = 1/κ(A):
n−1 X T T A + E = σiuivi + (σn − δAσ1)unvn i=1 • solution of (A + E)y = b is
−1 1 y = (A + E) b = vn σn − δAσ1 • relative change in solution is ky − xk 1 1 δAκ(A) = σn − = kxk σn − δAσ1 σn 1 − δAκ(A)
Singular value decomposition 4.39 Exercises
Exercise 1 to evaluate the sensivity to changes in A, we can also look at the residual
k(A + E)x − bk where x = A−1b is the solution of Ax = b
1. show that k(A + E)x − bk kE k ≤ κ(A)δ where δ = kbk A A kAk
2. show that for every A there exist b, E for which the inequality is sharp
Singular value decomposition 4.40 Exercises
Exercise 2: consider perturbations in A and b
Ax = b, (A + E)y = b + e
−1 assuming kE k2 < 1/kA k2, show that
ky − xk (δ + δ )κ(A) ≤ A b kxk 1 − δAκ(A) where δb = kek/kbk and δA = kE k2/kAk2
Singular value decomposition 4.41