Notes on Orthogonal Projections, Operators, QR Factorizations, Etc. These Notes Are Largely Based on Parts of Trefethen and Bau’S Numerical Linear Algebra
Total Page:16
File Type:pdf, Size:1020Kb
Notes on orthogonal projections, operators, QR factorizations, etc. These notes are largely based on parts of Trefethen and Bau’s Numerical Linear Algebra. Please notify me of any typos as soon as possible. 1. Basic lemmas and definitions Throughout these notes, V is a n-dimensional vector space over C with a fixed Her- mitian product. Unless otherwise stated, it is assumed that bases are chosen so that the Hermitian product is given by the conjugate transpose, i.e. that (x, y) = y∗x where y∗ is the conjugate transpose of y. Recall that a projection is a linear operator P : V → V such that P 2 = P . Its complementary projection is I − P . In class we proved the elementary result that: Lemma 1. Range(P ) = Ker(I − P ), and Range(I − P ) = Ker(P ). Another fact worth noting is that Range(P ) ∩ Ker(P ) = {0} so the vector space V = Range(P ) ⊕ Ker(P ). Definition 1. A projection is orthogonal if Range(P ) is orthogonal to Ker(P ). Since it is convenient to use orthonormal bases, but adapt them to different situations, we often need to change basis in an orthogonality-preserving way. I.e. we need the following type of operator: Definition 2. A nonsingular matrix Q is called an unitary matrix if (x, y) = (Qx, Qy) for every x, y ∈ V . If Q is real, it is called an orthogonal matrix. An orthogonal matrix Q can be thought of as an orthogonal change of basis matrix, in which each column is a new basis vector. Lemma 2. An orthogonal matrix Q has the property Q∗Q = I. Proof. If we consider the basis vectors ei and ej, then (ej, ei) = δij = (Qej, Qei). But ∗ ∗ ∗ (Qej, Qei) = ei Q Qej is the ith, jth entry of Q Q, so we are done. We can use this fact to prove a criterion for orthogonal projections: Lemma 3. A projection P is orthogonal if and only if A = A∗ where A is the matrix representing P in an orthonormal basis. Proof. Suppose that A = A∗. The Hermitian product of an element of the range of P , Ax, and an element of the kernel of P ,(I − A)y, is then y∗(I − A∗)Ax = y∗(I − A)Ax = y∗(A−A2)x = y∗0x = 0. For the other direction, first consider the representation B of P in an orthonormal basis such that {q1, . , qm} are in the range of P and {qm+1, . , qn} are in the kernel. Then B has m entries equal to 1 along the diagonal, and every other entry is 0. Clearly B∗ = B. We can change basis with an orthogonal matrix Q to get A = Q−1BQ. Then A∗ = Q∗B∗Q−1∗ = Q−1BQ = A since Q−1 = Q∗ by the previous lemma. 1 2 ∗ Recall that in class we proved that for a unit vector q, Pq = qq is an orthogonal projection onto the 1-dimensional subspace spanned by q. Note that if we wish to aa∗ construct such a projection for the span of a non-unit vector a we can use Pa = a∗a . Likewise, the orthogonal projection onto the subspace perpendicular to a vector a is aa∗ given by P⊥a = I − a∗a . To project orthogonally onto a subspace W spanned by the orthonormal set {q1, . , qm}, ˆ ˆ ˆ∗ form the matrix Q = (q1| ... |qm) and then PW = QQ . 2. Gram-Schmidt and the QR factorization n Given m vectors gi in C , the Gram-Schmidt algorithm produces m orthonormal n vectors qi in C . In its crudest form, this algorithm is: For i = 1 to m: vi = gi For j = 1 to i − 1: ∗ rji = qj gi vi = vi − rjiqj rii = |vi| qi = vi/rii However, in this form it is numerically unstable. A better version is: For i = 1 to m: vi = gi For i = 1 to m: rii = |vi| qi = vi/rii For j = i + 1 to n: ∗ rij = qi gj vj = vj − rijqi In the above algorithms, we have kept track of the operations performed with the quantities rji for j ≤ i. If we set rji = 0 for j > i, we have the entries of an upper- triangular, m by m matrix R such that G = QR where the columns of G and Q are the gi and qi, respectively. This is the QR-decomposition of G. 3. Least squares problems Overdetermined problems are common in applications, sometimes even desirable be- cause of the numerical stability of the least-squares solutions to such problems. 3 We will consider the system Ax ≈ b where A is in Cm×n, x ∈ Cn, and b ∈ Cm with m > n. For fixed given A and b, we wish to find an x that minimizes the norm of the residual r = Ax − b. We will also assume that A has rank n. Intuitively, to get Ax as close as possible to b, we would like to find the orthogonal projection y = PRange(A)b of b onto the range of A and then find an x such that Ax = y. It is not hard to prove this is the optimal solution: Lemma 4. A vector x minimizes the residual norm |r| = |Ax − b| if and only if r⊥Range(A), or equivalently if and only if Ax = PRange(A)b. Proof. Ax must lie in Range(A) by definition. Suppose that z = Axˆ and z 6= y where y = PRange(A)b as above.Note that b − y is in the kernel of PRange(A), which means it is orthogonal to vectors in Range(A). In particular, it is orthogonal to z − y. Then |b − z|2 = |(b − y) + (y − z)|2 = |b − y|2 + |y − z|2 > |b − y|2 which means z cannot minimize |b − z|, and thusx ˆ is not minimizing. The statement about equivalence can be seen as follows: r = PRange(A)b − b iff r = (I − PRange(A))b = P⊥Range(A)b. To actually compute the solution, first compute the QR decomposition of A: A = QRˆ . ˆ ˆ∗ ˆ ˆ∗ ˆ ˆ ˆ∗ Then PRange(A) = QQ and y = QQ b. So Ax = y becomes QRx = QQ b. Left- multiplying by Qˆ∗ simplifies this to Rx = Qˆ∗b. Since R is upper-triangular this is easy to solve by back-substitution. 4. The Singular Value Decomposition (SVD) If we are interested in iterates of a linear operator T m, then it is best to study T in a basis that is best adapted to the invariant subspaces of T . In terms of matrices, if T is represented in a given basis by A, we wish to perform a similarity decomposition A = P BP −1 such that B has an especially simple form. For example, if T has n linearly independent eigenvectors then it is possible to find a diagonal B by choosing P to have eigenvector columns. In general, this is not possible - one possibility is to have B in Jordan normal form, which we will discuss later in the course. However, in many applications we are interested in only the structure of T itself, and the domain and range of T might be different (in which case the iterates T m are not defined). Often, in this case the best decomposition of a rank n matrix A ∈ Cm×n, with m > n, is the (reduced) singular value decomposition, or SVD, in which we express A as A = UˆΣˆV ∗. Here Uˆ ∈ Cm×n and V ∈ Cn×n have orthonormal columns and Σˆ ∈ Rn×n has real, non-negative diagonal entries σi with σi ≥ σi+1 (the singular values). Geometrically, the SVD has a very intuitive explanation. The existence of the SVD (which will be proved below) implies that any matrix maps the unit ball in its domain to a hyper-ellipsoid in the range. The singular values are the lengths of the semi-major axes of the image hyper-ellipsoid. The columns are V , vi, are called the right singular ˆ vectors and the columns are U, ui, are the left singular vectors. If the SVD is rearranged ˆ as AV = UΣ then we see that Avi = σiui. 4 Theorem 1. A rank n matrix A ∈ Cm×n, with m > n, always has a reduced SVD. n Proof. We begin by finding the maximum value σ1 of A on the unit sphere in C and unit vectors v1,u1 such that Av1 = σ1u1. These exist because of the compactness of the unit sphere and the continuity of A (any linear transformation is continuous). (The value σ1 is the norm of A in the induced 2-norm - induced from the norm on the vector n space.) Now complete v1 to {vi}, an orthonormal basis of C , and complete u1 to {ui}, an orthonormal basis for the range of A. Putting these basis columns into matrices V ˆ and U1, we have σ ω∗ Uˆ ∗AV = S = 1 . 1 1 0 B ˆ Here U1 is m by n, V1 is n by n , S is n by n, ω is (n − 1) by 1, and B is (n − 1) by (n − 1). If n is 1 then we are done, otherwise we can proceed by induction on n - i.e. assume the reduced SVD exists for ranks n − 1 and smaller. σ1 2 ∗ σ1 We will now see that ω = 0. Consider S ≥ σ + ω ω ≥ σ1 , with ω 1 ω equality holding in the last step if and only if ω = 0. This means that |S| ≥ σ1, but |S| = |A| = σ1, so ω = 0.