<<

Lab 8 QR Decomposition

Lab Objective: Use the Gram-Schmidt algorithm and orthonormal transforma- tions to perform the QR decomposition.

The QR decomposition of a A is a factorization A = QR,whereQ has orthonormal columns and R is upper triangular. This decomposition is useful for computing least squares and finding eigenvalues. As stated in the following theorem, the QR decomposition of A always exists when the of A equals the number of columns of A.

Theorem 8.1. Let A be an m n matrix of rank n. Then A can be factored into ⇥ a product QR, where Q is an m m matrix with orthonormal columns and R is a ⇥ m n upper of rank n. ⇥ In this lab we will only discuss real matrices. All of these results can be extended to the complex numbers by replacing “” with “,” “transpose” with “hermitian conjugate,” and “” with “.”

Modified Gram-Schmidt Let A be an m n matrix of rank n. There are many methods for computing the QR ⇥ decomposition. First we will discuss the algorithm that computes Q by applying Gram-Schmidt to the columns of A. Let x n be the columns of A (the rank hypothesis implies that these are { i}i=1 linearly independent vectors). Then the Gram-Schmidt algorithm computes an q n for the span of the x . The Gram-Schmidt algorithm { i}i=1 i defines x q = 1 . 1 x k 1k It recursively defines qk for k>1by

xk pk 1 qk = ,k=2,...,n, xk pk 1 k k 83 84 Lab 8. QR decomposition

where k 1 pk 1 = qi, xk qi,k=2,...,n h i i=1 X and p0 = 0. Let Q be the matrix with columns q n . Let R be the upper triangular { i}i=1 matrix with entries rjk where rkk = xk pk 1 and rjk = qj, xk when j

In the modified Gram-Schmidt algorithm, q1 is the normalization of x1 as before. We then make each of the vectors x2,...,xn orthogonal to q1 by defining x := x q , x q ,k=2,...,n. k k h 1 ki 1

(Compare this to the usual Gram-Schmidt algorithm, where we only made x2 or- x2 thogonal to q1.) Next we define q2 = x . Once again we make x3,...,xn orthog- k 2k onal to q2 by defining x := x q , x q ,k=3,...,n. k k h 2 ki 2

(Each of these new vectors is a linear combination of vectors orthogonal to q1, and hence is orthogonal to q1 as well.) We continue this process until we have an orthonormal set q1,...,qk. The entire modified Gram-Schmidt algorithm is described in Algorithm 8.1.

Algorithm 8.1 The modified Gram-Schmidt algorithm. This algorithm returns orthogonal Q and upper triangular R such that A = QR. 1: procedure Modified Gram-Schmidt(A) 2: m, n shape(A)

3: Q copy(A)

4: R zeros(n, n)

5: for i =0...n 1 do 6: R[i, i] Q[:,i] k k 7: Q[:,i] Q[:,i]/R[i, i]

8: for j = i +1...n 1 do 9: R[i, j] Q[:,j]TQ[:,i]

10: Q[:,j] Q[:,j] R[i, j]Q[:,i] 11: return Q, R

QR Decomposition in SciPy The library in SciPy calculates the QR decomposition using a software package called LAPACK (Linear Algebra PACKage), which is incredibly ecient. Here is an example of using SciPy to compute the QR decomposition of a matrix. 85

>>> import numpy as np >>> from scipy import linalg as la

>>> A = np.random.rand(4,3) >>> Q, R = la.qr(A) >>> Q.dot(R) == A array([[ True, False, False], [True,False,False], [True,True,False], [True,True,False]],dtype=bool)

Note that Q.dot(R) does not equal A exactly because of rounding errors. However, we can check that the entries of Q.dot(R) are “close” to the entries of A with the NumPy method np.allclose(). This method checks that the elements of two arrays di↵er by less than a given tolerance, a tolerance specified by two keyword arguments rtol and 5 8 atol that default to 10 and 10 respectively. You can read the documentation to learn how the tolerance is computed from these numbers.

>>> np.allclose(Q.dot(R), A) True

We can use the same method to check that Q is “very close” to an orthogonal matrix.

>>> np.allclose(Q.T.dot(Q), np.eye(4)) True

Problem 1. Write a function that accepts as input a m n matrix of rank n ⇥ and computes its QR decomposition, returning the matrices Q and R. Your function should use Algorithm 8.1. Hint: Read about the function np.inner(). Another hint: Check that your function works by using np.allclose().

Problem 2. Write a function that accepts a A of full rank and returns det(A) . Use the QR decomposition of A to perform this calculation. | | Hint: What is the of an orthonormal matrix?

Householder Triangularization

Another way to compute the QR decomposition of a matrix is with a series of or- thonormal transformations. Like the Modified Gram-Schmidt algorithm, orthonor- mal transformations are numerically stable, meaning that they are less susceptible to rounding errors. 86 Lab 8. QR decomposition

x

Hvx

H v

Figure 8.1: This is a picture of a Householder transformation. The normal vector v defines the hyperplane H. The Householder transformation Hv of x is just the reflection of x across H.

Householder Transformations

The hyperplane in Rn with normal vector v is the set x Rn x, v =0. { 2 |h i } Equivalently, the hyperplane defined by v is just the orthogonal complement v?. See Figure 8.1. A Householder transformation of Rn is a linear transformation that reflects about a hyperplane. If a hyperplane H has normal vector v,letu = v/ v .Then k k the Householder transformation that reflects about H corresponds to the matrix H = I 2uuT. You can check that (I 2uuT)T(I 2uuT)=I, so Householder u transformations are orthogonal.

Householder Triangularization

The QR decomposition of an m n matrix A can also be computed with Householder ⇥ transformations via the Householder triangularization. Whereas Gram-Schmidt makes A orthonormal using a series of transformations stored in an upper triangular matrix, Householder triangularization makes A triangular by a series of orthonormal transformations. More precisely, the Householder triangularization finds an m m ⇥ orthogonal matrix Q and and m n upper triangular matrix R such that QA = R. 1 ⇥T T Since Q is orthogonal, Q = Q so A = Q R, and we have discovered the QR decomposition. Let’s demonstrate the idea behind Householder triangularization on a 4 3 4 ⇥ matrix A. Let e1,...,e4 be the standard basis of R . First we find an orthonormal transformation Q1 that maps the first column of A into the span of e1.Thisis diagrammed below, where represents an arbitrary entry. ⇤

⇤⇤⇤ ⇤⇤⇤ 0 0⇤⇤⇤1 Q1 0 ⇤⇤1 0 ⇤⇤⇤ ⇤⇤ B C ! B0 C B C B C @⇤⇤⇤A @ ⇤⇤A Let A2 = Q1A be the matrix on the right above. Now find an orthonormal trans- formation Q2 that fixes e1 and maps the second column of A2 into the span of e1 87

and e2. Notice that since Q2 fixes e1, the top row of A2 will be fixed by Q2, and only entries in the boxed submatrix will change.

Let A3 = Q2Q1A be the matrix on the right above. Finally, find an orthonormal transformation Q3 that fixes e1 and e2 and maps the third column of A3 into the span of e1, e2, and e3. The diagram below summarizes this process, where boxed entries indicate those a↵ected by the operation just performed.

⇤⇤⇤ ⇤⇤⇤ ⇤⇤⇤ ⇤⇤⇤ 0 0 0 Q3Q2Q1 0⇤⇤⇤1 = Q3Q2 0 ⇤⇤1 = Q3 0 ⇤⇤1 = 0 ⇤⇤1 . 0 0 0 00 ⇤⇤⇤ ⇤⇤ ⇤ ⇤ B C B 0 C B0 0 C B000 C B C B C B C B C @⇤⇤⇤A @ ⇤⇤A @ ⇤ A @ A We’ve accomplished our goal, which was to triangularize A using orthonormal transformations. But how do we construct the matrices Qk?

It turns out that we can choose each Qk to be a Householder transforma- tion. Suppose we have found Householder transformations Q1,...,Qk 1 such that Qk 1 ...Q2Q1 = Ak where TX A = 0 . k 0 X ✓ 00◆ Here, T is a (k 1) (k 1) upper triangular matrix. Let x be the kth column ⇥ of Ak.Writex = x0 + x00 where x0 is in the span of e1,...,ek 1.Sox00 looks like k 1 zeros followed by the first column of X00.Theideaistoreflectx00 about a hyperplane into the span of ek. It turns out that there are two choices that will work (see Figure 8.2). These hyperplanes have normal vectors x00 + x00 e and k k k x00 x00 ek. In fact, the more numerically stable transformation is to reflect about k k th the hyperplane with normal vector v = x00 + sign(x ) x00 e ,wherex is the k k k k k k k entry of x00 (or the top left entry of X00). (You can check that vk is orthonormal to e1,...,ek 1, so the orthonormal to vk contains e1,...,ek 1, and reflecting about it fixes e1,...,ek 1.) Thus, Qk is the Householder transformation Hv . k The final question is to find an ecient algorithm for computing Q = QnQn 1 ...Q1 and R = QnQn 1 ...Q1A. The idea is to start with R = A and Q = I.Thenwe compute Q1 and modify R to be Q1A and Q to be Q1. Next we compute Q2 and modify R to be Q2Q1A and Q to be Q2Q1, and so forth. As we have already dis- cussed, Q fixes the first k 1 rows and columns of any matrix it acts on. In fact, k if x00 =(0,...,0,xk,xk+1,...,xn) as above, then vk =(0,...,0,vk0 ,xk+1,...,xn) where vk0 = xk + sign(xk) x00 .Ifuk is the normalization of (vk0 ,xk+1,...,xn) n (k 1) k k 2 R ,then T 2x00(x00) I 0 Qk = I 2 = T . x 0 I 2uku k 00k ✓ k ◆ This means that, using block multiplication,

I 0 TX0 TX0 QkAk = T = T . 0 I 2u u 0 X00 0(I 2u u )X00 ✓ k k ◆✓ ◆ ✓ k k ◆ So at each stage of the algorithm, we only need to update the entries in the bottom right submatrix of A , and these change via by I 2u uT. k k k 88 Lab 8. QR decomposition

v1 x

v2

Hv2 x

Hv1 x

Figure 8.2: If we want to reflect x about a hyperplane into the span of e1, there are two hyperplanes that will work. The two choices are defined by the normal vectors

v1 and v2. Reflecting about the hyperplane defined by vi produces Hvi x.

Similarly,

A I 0 A A QkQk 1 ...Q1 = Qk = T = T , B 0 I 2u u B (I 2u u )B ✓ ◆ ✓ k k ◆✓ ◆ ✓ k k ◆

so to update Qi, we need only modify the bottom rows. These also change via matrix multiplication by I 2u uT. Q k k These arguments produce Algorithm 8.2.

Algorithm 8.2 The Householder triangularization algorithm. This algorithm re- turns orthonormal Q and upper triangular R satisfying A = QR. 1: procedure Householder(A) 2: m, n shape(A)

3: R copy(A)

4: Q Id(m)

5: for k =0...n 2 do 6: u copy(R[k :,k])

7: u[0] u[0] + sign(u[0]) u k k 8: u u/ u k k 9: R[k :,k :] R[k :,k :] 2u uTR[k :,k :] 10: Q[k :] Q[k :] 2u uTQ[k :] T 11: return Q ,R

Problem 3. Write a function that accepts as input a m n matrix of rank n ⇥ and computes its QR decomposition, returning the matrices Q and R. Your function should use Algorithm 8.2. Hint: Read about the function np.outer (). Another Hint: When defining u, reshape the array to an (m k) 1 ⇥ array. 89

Upper Hessenberg Form An upper is a square matrix with zeros below the first subdiag- onal. Every n n matrix A can be written A = QTHQ where Q is orthonormal ⇥ and H is an upper Hessenberg matrix, called the Hessenberg form of A. A fast algorithm for computing the QR decomposition of a Hessenberg matrix will be taught in Lab 9. This algorithm in turn leads to a fast algorithm for finding eigenvalues of a matrix. For now, we will outline an algorithm for computing the upper Hessenberg form of any matrix. Like Householder triangularization, this algorithm uses Householder transformations. To find orthogonal Q and upper Hessenberg H such that A = QTHQ, it suces to find such matrices that satisfy QTAQ = H. Thus, our strategy is to multiply A on the right and left by a series of orthonormal matrices until it is in Hessenberg form. If we use the same Q1 as in the first step of the Householder algorithm, then with Q1A we introduce zeros in the first column of A. However, T since we now have to multiply Q1A on the left by Q1 , all those zeros are destroyed. Instead, let’s try choosing a Q1 that fixes e1 and reflects the first column of A into the span of e1 and e2. Because Q1 fixes e1,theproductQ1A leaves the first row of A alone, and (Q A)QT leaves the first column of (Q A) alone. If A is a 5 5 1 1 1 ⇥ matrix, this looks like

⇤⇤⇤⇤⇤ ⇤⇤⇤⇤⇤ ⇤⇤⇤⇤⇤ 0⇤⇤⇤⇤⇤1 0⇤⇤⇤⇤⇤1 0⇤⇤⇤⇤⇤1 Q 0 QT 0 ⇤⇤⇤⇤⇤ 1· ⇤⇤⇤⇤ · 1 ⇤⇤⇤⇤ B C ! B0 C ! B0 C B⇤⇤⇤⇤⇤C B ⇤⇤⇤⇤C B ⇤⇤⇤⇤C B C B0 C B0 C B⇤⇤⇤⇤⇤C B ⇤⇤⇤⇤C B ⇤⇤⇤⇤T C @ AQA @ 1A A @ (Q1A)Q1 A We now iterate through the matrix until we obtain

⇤⇤⇤⇤⇤ 0⇤⇤⇤⇤⇤1 Q Q Q AQTQTQT = 0 . 3 2 1 1 2 3 ⇤⇤⇤⇤ B00 C B ⇤⇤⇤C B000 C B C @ ⇤⇤A The pseudocode for computing the Hessenberg form of a matrix with House- holder transformations is shown in Algorithm 8.3. Although the Hessenberg form exists for any square matrix, this algorithm only works for full-rank square matrices. Notice that this algorithm is very similar to Algorithm 8.2. When A is symmetric, its upper Hessenberg form is a . This is because the Qi’s zero out everything below the first subdiagonal of A and the T Qi ’s zero out everything above the first superdiagonal. Thus, the Hessenberg form of a symmetric matrix is especially useful, since as we saw in Lab 6, tridiagonal matrices make computations fast. 90 Lab 8. QR decomposition

Algorithm 8.3 Algorithm for reducing a nonsingular matrix A to Hessenburg form. This algorithm returns orthogonal Q and upper Hessenberg H such that A = QTHQ. 1: procedure Hessenberg(A) 2: m, n shape(A)

3: H copy(A)

4: Q Id(m)

5: for k =0...n 3 do 6: u copy (H ) k+1:,k 7: u[0] u[0] + sign(u[0]) u k k 8: u u/ u k k 9: H[k +1:,k :] H[k +1:,k :] 2u(uTH[k +1:,k :]) 10: H[:,k+1:] H[:,k+1:] 2(H[:,k+1:]u)uT 11: Q[k +1:] Q[k +1:] 2u(uTQ[k +1:]) 12: return Q, H

Problem 4. Write a function that accepts as input a nonsingular square matrix A and computes its Hessenberg form, returning orthogonal Q and upper Hessenberg H satisfying A = QTHQ. Your function should use Algo- rithm 8.3. Hint: When defining u, reshape the array to an (m k 1) 1 ⇥ array. What happens when you compute the Hessenberg factorization of a symmetric matrix?

*Givens Rotations

In the previous section we found the QR decomposition of a matrix using House- holder transformations, applying a series of these transformations to a matrix until it was in upper triangular form. We can use the same strategy to compute the QR decomposition with rotations instead of reflections. Let us begin with Givens rotations in R2. An arbitrary vector x =(a, b)T can be rotated into the span of e1 via an orthogonal transformation. In fact, the matrix cos ✓ sin ✓ T✓ = rotates a vector counterclockwise by ✓.Thus,if✓ is the sin ✓ cos ✓ ✓ ◆ clockwise-angle between x and e1, the vector T ✓x will be in the span of e1.We can find sin ✓ and cos ✓ with the formulas sin = opp and cos = adj ,sosin✓ = b hyp hyp pa2+b2 and cos ✓ = a (see Figure8.3). Then pa2+b2

a b 2 2 cos ✓ sin ✓ a pa2+b2 pa2+b2 a pa + b T ✓x = = b a = . sin ✓ cos ✓ b b 0 ✓ ◆✓ ◆ pa2+b2 pa2+b2 ! ✓ ◆ ✓ ◆ The matrix T above is an example of a 2 2 Givens . In general, ✓ ⇥ the Givens matrix G(i, j, ✓) represents the orthonormal transformation that rotates the 2-dimensional span of ei and ej by ✓ radians. The matrix for this transformation 91

b

✓ a

T Figure 8.3: Rotating clockwise by ✓ will send the vector (a, b) to the span of e1.

is I 00 0 0 0 c 0 s 0 0 1 G(i, j, ✓)= 00I 00. B0 s 0 c 0C B C B000 0 IC B C @ A This matrix is in block form with I representing the , c = cos ✓, and s =sin✓.Thec’s appear on the ith and jth diagonal entries. As before, we can choose ✓ so that G(i, j, ✓) rotates a given vector so that its th th ej-component is 0. Such a transformation will only a↵ect the i and j entries of any vector it acts on (and thus the ith and jth rows of any matrix it acts on). This flexibility makes Givens rotations ideal for some problems. For example, Givens rotations can be used to solve linear systems defined by sparse matrices by modifying only small parts of the array. Also, Givens rotations can be used to solve systems of equations in parallel. The advantages of Givens rotations are that they orthonormal and hence nu- merically stable (like Householder reflections), and they a↵ect only a small part of the array (like ). The disadvantage is that they require a greater number of floating point operations than Householder reflections. In prac- tice, the Givens algorithm is slower than the Householder algorithm, even when it is modified to decrease the number of floating point operations. However, since Givens rotations can be parallelized, they can be much faster than the Householder algorithm when multiple processors are used.

Givens Triangularization

We can apply Givens rotations to a matrix until it is in upper triangular form, producing a factorization A = QR where Q is a composition of Givens rotations and R is upper triangular. This is exactly the QR decomposition of A. The idea is to iterate through the subdiagonal entries of A in the order depicted by Figure 8.4. We zero out the ijth entry with a rotation in the plane spanned by ei 1 and ei. This rotation is just multiplication by the Givens matrix G(i 1,i,✓), which can be computed as in the example at the start of the previous section. 2 2 We just set a = ai 1,j and b = ai,j,soc = cos ✓ = a/pa + b and s =sin✓ = b/pa2 + b2. For example, on a 2 3 matrix we may perform the following operations: ⇥ 92 Lab 8. QR decomposition

1 2 3 4 5

Figure 8.4: This figure illustrates the order in which to zero out subdiagonal entries in the Givens triangularization algorithm. The heavy black line is the main diagonal of the matrix. Entries should be zeroed out from bottom to top in each column, beginning with the leftmost column.

⇤⇤ ⇤⇤ ⇤⇤ ⇤⇤ G(2, 3,✓ ) G(1, 2,✓ ) 0 G(2, 3,✓ ) 0 0⇤⇤1 1 0 ⇤⇤1 2 0 ⇤ 1 3 0 ⇤ 1 ! 0 ! 0 ! 0 0 ⇤⇤ ⇤ ⇤ @ A @ A @ A @ A At each stage, the boxed entries are those modified by the previous transformation. The final transformation G(2, 3,✓3) operates on the bottom two rows, but since the first two entries are zero, they are una↵ected. Assuming that at the ijth stage of the algorithm, aij is nonzero, Algorithm 8.4 computes the Givens triangularization of a matrix..

Algorithm 8.4 Givens triangularization. Return an orthogonal matrix Q and an upper triangular matrix R satisfying A = QR. 1: procedure Givens Triangularization(A) 2: m, n shape(A)

3: R copy(A)

4: Q Id(m)

5: G empty(2, 2)

6: for j =0...n 1 do 7: for i = m 1 ...j+1do 8: a, b R[i 1,j],R[i, j] 9: G [[a, b], [ b, a]]/pa2 + b2 10: R[i 1:i +1,j :] GR[i 1:i +1,j :] 11: Q[i 1:i +1, :] GQ[i 1:i +1, :] 12: return QT,R

Notice that in Algorithm 8.4, we do not actually create the matrices G(i, j, ✓) and multiply them by the original matrix. Instead we modify only those entries of 93 the matrix that are a↵ected by the transformation. As an additional way to save memory, it is possible to modify this algorithm so that Q and R are stored in the original matrix A.

Problem 5. (Optional) Write a function that computes the Givens trian- gularization of a matrix, using Algorithm 8.4. Assume that at the ijth stage of the algorithm, aij will be nonzero.

Problem 6. (Optional) Modify your solution to Problem 5 to compute the Givens triangularization of an upper Hessenberg matrix, making the follow- ing changes:

1. Iterate through the first subdiagonal from left to right. (These are the only entries that need to be zeroed out.)

2. Line 11 of Algorithm 8.4 updates Q with the current G. Decrease the number of entries of Q that are modified in this line. Do this by replacing Q[i 1:i +1, :] with Q[i 1:i +1, : k ]where i ki is some appropriately chosen number (dependent on i) such that Q[i 1:i +1,k :] = 0. i Hint: Here is how to generate a random upper Hessenberg matrix on which to test your function. The idea is to generate a and then zero out all entries below the first subdiagonal.

import numpy as np import scipy.linalg as la from math import sqrt A=np.random.rand(500,500)

# We do not need to modify the first row of A # la.triu(A[1:]) zeros out all entries below the diagonal of A[1:] A[1:] = la.triu(A[1:])

# A is now upper Hessenberg 94 Lab 8. QR decomposition