Notes on orthogonal projections, operators, QR factorizations, etc. These notes are largely based on parts of Trefethen and Bau’s Numerical . Please notify me of any typos as soon as possible.

1. Basic lemmas and definitions Throughout these notes, V is a n-dimensional vector space over C with a fixed Her- mitian product. Unless otherwise stated, it is assumed that bases are chosen so that the Hermitian product is given by the conjugate transpose, i.e. that (x, y) = y∗x where y∗ is the conjugate transpose of y. Recall that a projection is a linear operator P : V → V such that P 2 = P . Its complementary projection is I − P . In class we proved the elementary result that: Lemma 1. Range(P ) = Ker(I − P ), and Range(I − P ) = Ker(P ). Another fact worth noting is that Range(P ) ∩ Ker(P ) = {0} so the vector space V = Range(P ) ⊕ Ker(P ). Definition 1. A projection is orthogonal if Range(P ) is orthogonal to Ker(P ). Since it is convenient to use orthonormal bases, but adapt them to different situations, we often need to change basis in an -preserving way. I.e. we need the following type of operator: Definition 2. A nonsingular Q is called an if (x, y) = (Qx, Qy) for every x, y ∈ V . If Q is real, it is called an . An orthogonal matrix Q can be thought of as an orthogonal change of basis matrix, in which each column is a new basis vector. Lemma 2. An orthogonal matrix Q has the property Q∗Q = I.

Proof. If we consider the basis vectors ei and ej, then (ej, ei) = δij = (Qej, Qei). But ∗ ∗ ∗ (Qej, Qei) = ei Q Qej is the ith, jth entry of Q Q, so we are done.  We can use this fact to prove a criterion for orthogonal projections: Lemma 3. A projection P is orthogonal if and only if A = A∗ where A is the matrix representing P in an . Proof. Suppose that A = A∗. The Hermitian product of an element of the range of P , Ax, and an element of the kernel of P ,(I − A)y, is then y∗(I − A∗)Ax = y∗(I − A)Ax = y∗(A−A2)x = y∗0x = 0. For the other direction, first consider the representation B of P in an orthonormal basis such that {q1, . . . , qm} are in the range of P and {qm+1, . . . , qn} are in the kernel. Then B has m entries equal to 1 along the diagonal, and every other entry is 0. Clearly B∗ = B. We can change basis with an orthogonal matrix Q to get A = Q−1BQ. Then A∗ = Q∗B∗Q−1∗ = Q−1BQ = A since Q−1 = Q∗ by the previous lemma.  1 2

∗ Recall that in class we proved that for a unit vector q, Pq = qq is an orthogonal projection onto the 1-dimensional subspace spanned by q. Note that if we wish to aa∗ construct such a projection for the span of a non-unit vector a we can use Pa = a∗a . Likewise, the orthogonal projection onto the subspace perpendicular to a vector a is aa∗ given by P⊥a = I − a∗a . To project orthogonally onto a subspace W spanned by the orthonormal set {q1, . . . , qm}, ˆ ˆ ˆ∗ form the matrix Q = (q1| ... |qm) and then PW = QQ .

2. Gram-Schmidt and the QR factorization n Given m vectors gi in C , the Gram-Schmidt algorithm produces m orthonormal n vectors qi in C . In its crudest form, this algorithm is:

For i = 1 to m: vi = gi For j = 1 to i − 1: ∗ rji = qj gi vi = vi − rjiqj rii = |vi| qi = vi/rii

However, in this form it is numerically unstable. A better version is:

For i = 1 to m: vi = gi For i = 1 to m: rii = |vi| qi = vi/rii For j = i + 1 to n: ∗ rij = qi gj vj = vj − rijqi

In the above algorithms, we have kept track of the operations performed with the quantities rji for j ≤ i. If we set rji = 0 for j > i, we have the entries of an upper- triangular, m by m matrix R such that G = QR where the columns of G and Q are the gi and qi, respectively. This is the QR-decomposition of G.

3. Least squares problems Overdetermined problems are common in applications, sometimes even desirable be- cause of the of the least-squares solutions to such problems. 3

We will consider the system Ax ≈ b where A is in Cm×n, x ∈ Cn, and b ∈ Cm with m > n. For fixed given A and b, we wish to find an x that minimizes the norm of the residual r = Ax − b. We will also assume that A has n. Intuitively, to get Ax as close as possible to b, we would like to find the orthogonal projection y = PRange(A)b of b onto the range of A and then find an x such that Ax = y. It is not hard to prove this is the optimal solution: Lemma 4. A vector x minimizes the residual norm |r| = |Ax − b| if and only if r⊥Range(A), or equivalently if and only if Ax = PRange(A)b. Proof. Ax must lie in Range(A) by definition. Suppose that z = Axˆ and z 6= y where y = PRange(A)b as above.Note that b − y is in the kernel of PRange(A), which means it is orthogonal to vectors in Range(A). In particular, it is orthogonal to z − y. Then |b − z|2 = |(b − y) + (y − z)|2 = |b − y|2 + |y − z|2 > |b − y|2 which means z cannot minimize |b − z|, and thusx ˆ is not minimizing. The statement about equivalence can be seen as follows: r = PRange(A)b − b iff r = (I − PRange(A))b = P⊥Range(A)b.  To actually compute the solution, first compute the QR decomposition of A: A = QRˆ . ˆ ˆ∗ ˆ ˆ∗ ˆ ˆ ˆ∗ Then PRange(A) = QQ and y = QQ b. So Ax = y becomes QRx = QQ b. Left- multiplying by Qˆ∗ simplifies this to Rx = Qˆ∗b. Since R is upper-triangular this is easy to solve by back-substitution.

4. The Singular Value Decomposition (SVD) If we are interested in iterates of a linear operator T m, then it is best to study T in a basis that is best adapted to the invariant subspaces of T . In terms of matrices, if T is represented in a given basis by A, we wish to perform a similarity decomposition A = PBP −1 such that B has an especially simple form. For example, if T has n linearly independent eigenvectors then it is possible to find a diagonal B by choosing P to have eigenvector columns. In general, this is not possible - one possibility is to have B in Jordan normal form, which we will discuss later in the course. However, in many applications we are interested in only the structure of T itself, and the domain and range of T might be different (in which case the iterates T m are not defined). Often, in this case the best decomposition of a rank n matrix A ∈ Cm×n, with m > n, is the (reduced) singular value decomposition, or SVD, in which we express A as A = UˆΣˆV ∗. Here Uˆ ∈ Cm×n and V ∈ Cn×n have orthonormal columns and Σˆ ∈ Rn×n has real, non-negative diagonal entries σi with σi ≥ σi+1 (the singular values). Geometrically, the SVD has a very intuitive explanation. The existence of the SVD (which will be proved below) implies that any matrix maps the unit ball in its domain to a hyper-ellipsoid in the range. The singular values are the lengths of the semi-major axes of the image hyper-ellipsoid. The columns are V , vi, are called the right singular ˆ vectors and the columns are U, ui, are the left singular vectors. If the SVD is rearranged ˆ as AV = UΣ then we see that Avi = σiui. 4

Theorem 1. A rank n matrix A ∈ Cm×n, with m > n, always has a reduced SVD. n Proof. We begin by finding the maximum value σ1 of A on the unit sphere in C and unit vectors v1,u1 such that Av1 = σ1u1. These exist because of the compactness of the unit sphere and the continuity of A (any linear transformation is continuous). (The value σ1 is the norm of A in the induced 2-norm - induced from the norm on the vector n space.) Now complete v1 to {vi}, an orthonormal basis of C , and complete u1 to {ui}, an orthonormal basis for the range of A. Putting these basis columns into matrices V ˆ and U1, we have  σ ω∗  Uˆ ∗AV = S = 1 . 1 1 0 B ˆ Here U1 is m by n, V1 is n by n , S is n by n, ω is (n − 1) by 1, and B is (n − 1) by (n − 1). If n is 1 then we are done, otherwise we can proceed by induction on n - i.e. assume the reduced SVD exists for ranks n − 1 and smaller.     σ1 2 ∗ σ1 We will now see that ω = 0. Consider S ≥ σ + ω ω ≥ σ1 , with ω 1 ω equality holding in the last step if and only if ω = 0. This means that |S| ≥ σ1, but |S| = |A| = σ1, so ω = 0. ˆ ∗ Now by induction B has a reduced SVD B = U2Σ2V2 , so       ˆ 1 0 σ1 0 1 0 ∗ A = U1 ˆ ∗ V1 0 U2 0 Σ2 0 V2     ˆ ˆ 1 0 ∗ 1 0 ∗ and this is a reduced SVD of A with U = U1 ˆ and V = ∗ V1 . 0 U2 0 V2  This section will be extended soon.

5. Householder reflections and triangularization Hopefully coming soon.

6. Schur factorization Coming eventually.