APPLIED THEORY

E

Lecture Notes for Math 464/514 Presented by DR. MONIKA NITSCHE

E Typeset and Editted by ERIC M. BENNER

E

STUDENTS PRESS December 3, 2013 Copyright © 2013 Contents

1 Introduction to Linear Algebra1 1.1 Lecture 1: August 19, 2013 ...... 1 About the class, 1. Linear Systems, 1. Example: Application to boundary value problem, 2. Analysis of error, 3. Solution of the discretized equation, 4. 2 Matrix Inversion5 2.1 Lecture 2: August 21, 2013 ...... 5 Gaussian Elimination, 5. Inner-product based implementation, 7. Office hours and other class notes, 8. Example: Gauss Elimination, 8. 2.2 Lecture 3: August 23, 2013 ...... 8 Example: Gauss Elimination, cont., 8. Operation Cost of Forward Elimination, 9. Cost of the Order of an Algorithm, 10. Validation of Lower/Upper Triangular Form, 11. Theoretical derivation of Lower/Upper Form, 11. 2.3 HW 1: Due August 30, 2013 ...... 12 3 Factorization 15 3.1 Lecture 4: August 26, 2013 ...... 15 Elementary Matrices, 15. Solution of Matrix using the Lower/Upper factorization, 18. Sparse and Banded Matrices, 18. Motivation for Gauss Elimination with Pivoting, 19. 3.2 Lecture 5: August 28, 2013 ...... 19 Motivation for Gauss Elimination with Pivoting, cont., 19. Discussion of well-posedness, 20. Gaussian elimination with pivoting, 21. 3.3 Lecture 6: August 30, 2013 ...... 22 Discussion of HW problem 2, 22. PLU factorization, 22. 3.4 Lecture 7: September 4, 2013...... 24 PLU Factorization, 24. Triangular Matrices, 25. Multiplication of lower triangular ma- trices, 25. Inverse of a lower , 25. Uniqueness of LU factorization, 26. Existence of the LU factorization, 26. 3.5 Lecture 8: September 6, 2013...... 27 About Homeworks, 27. Discussion of ill-conditioned systems, 27. Inversion of lower triangular matrices, 28. Example of LU decomposition of a lower triangular matrix, 28. Banded matrix example, 29.

iii Nitsche and Benner Applied Matrix Theory

3.6 Lecture 9: September 9, 2013...... 29 Existence of the LU factorization (cont.), 29. Rectangular matrices, 31. 3.7 HW 2: Due September 13, 2013 ...... 32 4 Rectangular Matrices 35 4.1 Lecture 10: September 11, 2013 ...... 35 Rectangular matrices (cont.), 35. Example of RREF of a Rectangular Matrix, 37. 4.2 Lecture 11: September 13, 2013 ...... 38 Solving Ax = b, 38. Example, 38. Linear functions, 39. Example: Transpose operator, 40. Example: trace operator, 40. , 41. Proof of transposition property, 42. 4.3 Lecture 12: September 16, 2013 ...... 42 Inverses, 42. Low rank perturbations of I, 43. The Sherman–Morrison Formula, 44. Finite difference example with periodic boundary conditions, 44. Examples of pertur- bation, 45. Small perturbations of I, 45. 4.4 Lecture 13: September 18, 2013 ...... 46 Small perturbations of I (cont.), 46. Matrix Norms, 47. Condition Number, 48. 4.5 HW 3: Due September 27, 2013 ...... 49 5 Vector Spaces 55 5.1 Lecture 14: September 20, 2013 ...... 55 Topics in Vector Spaces, 55. , 55. , 56. Examples of spaces, 57. 5.2 Lecture 15: September 23, 2013 ...... 58

The four subspaces of Am×n, 58. 5.3 Lecture 16: September 25, 2013 ...... 61 The Four Subspaces of A, 62. Linear Independence, 63. 5.4 Lecture 17: September 27, 2013 ...... 64 Linear functions (rev), 64. Review for exam, 64. Previous lecture continued, 65. 5.5 Lecture 18: October 2, 2013...... 66 Exams and Points, 66. Continuation of last lecture, 66. 6 Least Squares 69 6.1 Lecture 19: October 4, 2013...... 69 Least Squares, 69. 6.2 Lecture 20: October 7, 2013...... 70 Properties of Transpose Multiplication, 71. The Normal Equations, 71. Exam 1, 73. 6.3 Lecture 21: October 9, 2013...... 74 Exam Review, 74. Least squares and minimization, 74. 6.4 HW 4: Due October 21, 2013...... 76

iv Nitsche and Benner Applied Matrix Theory

7 Linear Transformations 81 7.1 Lecture 22: October 14, 2013 ...... 81 Linear Transformations, 83. Examples of Linear Functions, 83. Matrix representation of linear transformations, 83. 7.2 Lecture 23: October 16, 2013 ...... 84 of a linear transformation, 84. Action of linear transform, 87. , 88. 7.3 Lecture 24: October 21, 2013 ...... 89 Change of Basis (cont.), 89. 7.4 Lecture 25: October 23, 2013 ...... 91 Properties of Special Bases, 91. Invariant Subspaces, 93. 7.5 HW 5: Due November 4, 2013 ...... 94 8 Norms 99 8.1 Lecture 26: October 25, 2013 ...... 99 Difinition of norms, 99. Vector Norms, 99. The two norm, 99. Matrix Norms, 101. Induced Norms, 102. 8.2 Lecture 27: October 28, 2013 ...... 102 Matrix norms (review), 102. Frobenius Norm, 102. Induced Matrix Norms, 104. 8.3 Lecture 28: October 30, 2013 ...... 106 The 2-norm, 106. 9 Orthogonalization with Projection and Rotation 109 9.1 Lecture 28 (cont.) ...... 109 Inner Product Spaces, 109. 9.2 Lecture 29: November 1, 2013 ...... 110 Inner Product Spaces, 110. Fourier Expansion, 111. Orthogonalization Process (Gramm-Schmidt), 111. 9.3 Lecture 30: November 4, 2013 ...... 112 Gramm–Schmidt Orthogonalization, 112. 9.4 Lecture 31: November 6, 2013 ...... 115 Unitary (orthogonal) matrices, 116. Rotation, 117. Reflection, 118. 9.5 HW 6: Due November 11, 2013 ...... 118 9.6 Lecture 32: November 8, 2013 ...... 120 Elementary orthogonal projectors, 120. Elementary reflection, 121. Complimentary Subspaces of V, 121. Projectors, 121. 9.7 Lecture 33: November 11, 2013...... 122 Projectors, 122. Representation of a projector, 123. 9.8 Lecture 34: November 13, 2013...... 124 n Projectors, 124. Decompositions of R , 125. Range Nullspace decomposition of An×n, 126. 9.9 HW 7: Due November 22, 2013 ...... 126

v Nitsche and Benner Applied Matrix Theory

9.10 Lecture 35: November 15, 2013...... 128 Range Nullspace decomposition of An×n, 128. Corresponding factorization of A, 129. 10 Singular Value Decomposition 131 10.1 Lecture 35 (cont.) ...... 131 Singular Value Decomposition, 131. 10.2 Lecture 36: November 18, 2013...... 132 Singular Value Decomposition, 132. Existence of the Singular Value Decomposition, 133. 10.3 Lecture 37: November 20, 2013...... 136 Review and correction from last time, 136. Singular Value Decomposition, 136. Geometric interpretation, 138. 10.4 Lecture 38: November 22, 2013...... 139 Review for Exam 2, 139. Norms, 139. More major topics, 140. 10.5 HW 8: Due December 10, 2013 ...... 142 10.6 Lecture 39: November 27, 2013...... 144 Singular Value Decomposition, 144. SVD in Matlab, 145. 11 Additional Topics 149 11.1 Lecture 39 (cont.) ...... 149 The , 149. 11.2 Lecture 40: December 2, 2013 ...... 150 Further details for class, 150. Diagonalizable Matrices, 150. Eigenvalues and eigenvec- tors, 150. Index 155

Other Contents 157

vi UNIT 1

Introduction to

1.1 Lecture 1: August 19, 2013

About the class The textbook for the class will be Matrix Analysis and Applied Linear Algebra by Meyer. Another highly recommended text is Laub’s Matrix Analysis for Scientists and Engineers.

Linear Systems A linear system may be of the general form

Ax = b. (1.1.1)

This may be represented in several equivalent ways.

2x1 + x2 − 3x3 = 18, (1.1.2a)

−4x1 + 5x3 = −28, (1.1.2b)

6x1 + 13x2 = 37. (1.1.2c)

This also may be put in matrix form       2 1 −3 x1 18 −4 0 5 x2 = −28. (1.1.3) 6 13 0 x3 37

Finally, a the third common form is vector form:

 2  1 −3  18 −4 x1 +  0 x2 +  5 x3 = −28. (1.1.4) 6 13 0 37

1 Nitsche and Benner Unit 1. Introduction to Linear Algebra

y

y(t)

t t0 t1 t2 t3 ··· tn

Figure 1.1. Finite difference approximation of a 1D boundary value problem.

Example: Application to boundary value problem We will use finite difference approximations on a rectangular grid to solve the system,

− y00(t) = f(t), for t ∈ [0, 1], (1.1.5) with the boundary conditions

y(0) = 0, (1.1.6a) y(1) = 0. (1.1.6b)

This is a 1D version of the general Laplace equation represented by,

− ∆u = f (1.1.7) or in more engineering/science form

− ∇2u = f. (1.1.8)

The Laplace operator in cartesian coordinates,

∇2u = ∇ · (∇u), (1.1.9a)

= uxx + uyy + uzz. (1.1.9b)

Finite Difference Approximation

Let tj = j∆t, with j = 0,...,N. The approximate forms of the solution yj ≈ y(tj). Now we need to approximate the derivatives with discrete values of the variables. The forward difference approximation is

0 yj+1 − yj y (tj) = , (1.1.10) tj+1 − tj or y − y y0(t ) = j+1 j , (1.1.11) j ∆t 2 1.1. Lecture 1: August 19, 2013 Applied Matrix Theory

The backward difference approximation is y − y y0(t ) = j j−1 . (1.1.12) j ∆t The centered difference approximation is y − y y0(t ) = j+1 j−1 . (1.1.13) j 2∆t Each of these are useful approximations to the first derivative that have varying properties when applied to specific differential equations. The second derivative may be approximated by combining the approximations of the first derivative 0 0 yj+ 1 − yj− 1 (y0)0(t ) ≈ 2 2 , (1.1.14a) j ∆t yj+1−yj − yj −yj−1 = ∆t ∆t , (1.1.14b) ∆t y − 2y + y = j+1 j j−1 . (1.1.14c) ∆t2

Analysis of error To understand the error of this approximation we may utilize the Taylor series. A general Taylor series is 1 1 f(x) = f(a) + f 0(a)(x − a) + f 00(a)(x − a)2 + f 000(a)(x − a)3 + ··· (1.1.15) 2 3! By the Taylor remainder theorem, we may approximate the error with a special truncation of the series, 1 1 f(x) = f(a) + f 0(a)(x − a) + f 00(a)(x − a)2 + f 000(ξ)(x − a)3, (1.1.16) 2 3! or simply 1 f(x) = f(a) + f 0(a)(x − a) + f 00(a)(x − a)2 + O(x − a)3. (1.1.17) 2 The difference we are interested in to find the error is, y(t ) − 2y(t ) + y(t ) E = y00(t ) − j+1 j j−1 (1.1.18) j ∆t2 The Taylor series,

0 2 y(tj+1) = y(tj + ∆t) = y(tj) + y (tj)∆t + O ∆t , (1.1.19a) 0 2 y(tj−1) = y(tj − ∆t) = y(tj) − y (tj)∆t + O ∆t (1.1.19b) will need to be substituted. A function g is said to be order 2, or g = O(h2), if, |g| ≤ Ch2. (1.1.20)

3 Nitsche and Benner Unit 1. Introduction to Linear Algebra

Solution of the discretized equation We now substitute the discrete difference, y − 2y + y − j+1 j j−1 = f(t ), for j = 1, . . . , n − 1 (1.1.21) ∆t2 j and the boundary conditions become

y0 = 0, (1.1.22a)

yn = 0. (1.1.22b)

This gives the linear system which will need to be solved for the unknowns yi.

 2 −1 0 ··· 0     y1 f(t1)  .. . −1 2 −1 . .  y2   f(t2)   .   .  2  .   0 −1 2 .. 0  .  = ∆t  . . (1.1.23)        ......  y  f(t )  . . . . −1  n−2  n−2  0 ··· 0 −1 2 yn−1 f(tn−1)

4 UNIT 2

Matrix Inversion

2.1 Lecture 2: August 21, 2013

Previously we came up with a tridiagonal system for finite difference solution last time.

Gaussian Elimination

We want to solve Ax = b. Claim: Gaussian elimination: A = LU Notation:

A = [aij] (2.1.1)

Lower triangular system Lx = b. In class we use underlines to indicate the vector. In general these vectors are column vectors, and we will use x| to indicate the row vector.

Lower triangular system Lx = b

      `11 0 0 0 x1 b1 ` ` 0 ··· 0       21 22      ` ` ` 0   .   .   31 32 21   .  =  .  (2.1.2)  . .. .       . . .      `n1 `n2 `n3 ··· `nn xn bn or

`11x1 = b1 (2.1.3a)

`21x1 + `22x2 = b2 (2.1.3b) ··· (2.1.3c)

`n1x1 + `n2x2 + ··· + `nnxn = bn (2.1.3d)

5 Nitsche and Benner Unit 2. Matrix Inversion

Rearranging to solve the equations,

b1 x1 = (2.1.4a) `11 b2 − `21x1 x2 = (2.1.4b) `22 ··· (2.1.4c)  bi − `i(i−1)xi−1 + ··· + `i1x1 xi = (2.1.4d) `ii

The basic algorithm for solution of the above system in pseudo code follows:

1: x1 ← b1/`11 2: for i ← 2, n do Pi−1 3: xi ← [bi − k=1 `ikxk]/`ii 4: end for

The operation count, Nops, becomes,

n X   Nops = 1 + 1 + 1 + (i − 1) + (i − 2) . (2.1.5) i=2 |{z} |{z} | {z } | {z } division substitution multiplication addition

Each of the terms arise directly from the steps of the algorithm shown above.

ASIDE: Finite sums

We need the following sums for our derivations of the operation counts,

n X n(n + 1) i = , (2.1.6) 2 i=1

n X n(n + 1)(2n + 1) i2 = . (2.1.7) 6 i=1

Evaluating the operation count,

n X Nops = 1 + (2i − 1), (2.1.8a) i=2 n X = (2i − 1), (2.1.8b) i=1 n ! X = 2 i − n, (2.1.8c) i=1 = n(n + 1) − n, (2.1.8d) = n2. (2.1.8e)

6 2.1. Lecture 2: August 21, 2013 Applied Matrix Theory

Implementation of lower triangular solution in Matlab We give a Matlab code for this solution, 1 function x = Ltrisol(L,b) 2 % s o l v e $Lx = b$ , assuming $L { i i }\ne 0$ 3 n = length (b ) ; 4 % initialize the size of your vectors 5 x1 = b(1)/l(1,1); 6 for i = 2 : n 7 x(i)=b(i); 8 for k = 1 : i −1 9 x ( i ) = x ( i ) − l ( i , k ) ∗ x ( k ) ; 10 end 11 end 12 % 13 end

This would be saved as the code Ltrisol.m and would be run as

>> L = ...; b = ...; >> x = Ltrisol(L, b)

Warning: Matlab loops are very slow!

Inner-product based implementation How do we re-write the code as inner products? We can reorder the second for-loop so that it is simply an inner-product, 1 function x = Ltrisol(L,b) 2 % s o l v e $Lx = b$ , assuming $L { i i }\ne 0$ 3 n = length (b ) ; 4 % initialize the size of your vectors 5 x1 = b(1)/l(1,1); 6 for i = 2 : n 7 x(i)=(b(i) − l ( i , 1 : i −1)∗x ( 1 : i −1))/l(i,i); 8 end 9 % 10 end

Note that the l(i,1:i-1) term is a row vector and x(1:i-1) is a column vector so this code will work fine. Recall that this required that x be initialized as a column vector. The inner part can also be rewritten more cleanly as, 1 function x = Ltrisol(L,b) 2 % s o l v e $Lx = b$ , assuming $L { i i }\ne 0$ 3 n = length (b ) ; 4 % initialize the size of your vectors 5 x1 = b(1)/l(1,1); 6 for i = 2 : n 7 k = 1 : i −1; 8 x(i)=(b(i) − l ( i , k )∗ x(k))/l(i,i); 9 end

7 Nitsche and Benner Unit 2. Matrix Inversion

10 % 11 end

Office hours and other class notes

Office hours will be from 12–1 on MWF, the web address is, www.math.unm.edu/~nitsche/ math464.html.

Example: Gauss Elimination Example:

2x1 − x2 + 3x3 = 13 (2.1.9a)

−4x1 + 6x2 − 5x3 = −28 (2.1.9b)

6x1 + 13x2 − 16x3 = 37 (2.1.9c)

Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1 and R3 → R3 − (−3)R1.

2x1 − x2 + 3x3 = 13 (2.1.10a)

4x2 + x3 = −2 (2.1.10b)

16x2 + 7x3 = −2 (2.1.10c)

Next step will be R3 → R3 − (4)R2.

2.2 Lecture 3: August 23, 2013

Example: Gauss Elimination, cont. Example:

2x1 − x2 + 3x3 = 13 (2.2.1a)

−4x1 + 6x2 − 5x3 = −28 (2.2.1b)

6x1 + 13x2 − 16x3 = 37 (2.2.1c)

Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1 and R3 → R3 − (−3)R1.

2x1 − x2 + 3x3 = 13 (2.2.2a)

4x2 + x3 = −2 (2.2.2b)

16x2 + 7x3 = −2 (2.2.2c)

8 2.2. Lecture 3: August 23, 2013 Applied Matrix Theory

Next step will be R3 → R3 − (4)R2.

2x1 − x2 + 3x3 = 13 (2.2.3a)

4x2 + x3 = −2 (2.2.3b)

3x3 = 6 (2.2.3c)

Now we begin the backward substitution.

x3 = 2; (2.2.4a)

x2 = (−2 − x3)/4, (2.2.4b) = −1; (2.2.4c)

x1 = (13 + x2 − 3x3)/2, (2.2.4d) = 3. (2.2.4e)

Gauss Elimination is forward elimination and backward substitution. Now we will do the same problem in matrix form,

 2 −1 3 13   2 −1 3 13   −4 6 −5 −28  →  0 4 1 −2 , (2.2.5a) 6 13 16 37 0 16 7 −2  2 −1 3 13  →  0 4 1 −2 . (2.2.5b) 0 0 3 6

Operation Cost of Forward Elimination Now we want to know the operation count for the forward elimination step when we take A → U without pivoting for a general n × n matrix, A = [aij]. As an example of each step:

    a11 a12 a13 a14 a15 a11 a12 a13 a14 a15 0 0 0 0 a21 a22 a23 a24 a25  0 a22 a23 a24 a25    0 0 0 0  a31 a32 a33 a34 a35 →  0 a32 a33 a34 a35 (2.2.6a)    0 0 0 0  a41 a42 a43 a44 a45  0 a42 a43 a44 a45 0 0 0 0 a51 a52 a53 a54 a55 0 a52 a53 a54 a55

aij These operations are given by, rowj → rowj − `ijrowi, where `ij = if aii 6= 0 (aii should aii a1j not be close to zero or we will need to use pivoting). An example, a1j → aij − a1j = 0. a11 The next step,   a11 a12 a13 a14 a15 0 0 0 0  0 a22 a23 a24 a25  00 00 00  →  0 0 a33 a34 a35 (2.2.6b)  00 00 00   0 0 a43 a44 a45 00 00 00 0 0 a53 a54 a55

9 Nitsche and Benner Unit 2. Matrix Inversion

y y

y(t) y(t)

t t t0 t1 t2 t3 ··· tn t0 t2 t4 t6 t8 t10 t12 t14 t16 ··· t4n (a) n grid (b) 4n grid

Figure 2.1. One-dimensional discrete grids.

At ith step (i = 1 : n − 1), ˜ B(n−i)×(n−i) → B(n−i)×(n−i), (2.2.7) the cost of the individual step: n − i + 2(n − i)2. The total cost is thus, | {z } | {z } comp `ij comp aij

n−1 X  2 Nops = (n − i) + 2(n − 1) (2.2.8a) i=1

Let k = n − i then i = 1 → k = n − 1 and i = n − 1 → k = n − (n − 1) = 1

1 X = (k + 2k2), (2.2.8b) k=n−1 (n − 1)n (n − 1)n(2(n − 1) + 1) = +2 , (2.2.8c) 2 6 | {z } | {z } O(n2) O(n3) ≈ On3. (2.2.8d)

This means that the problem scales with order 3.

Cost of the Order of an Algorithm For an order 3 algorithm, if you increase the size of your matrix by a factor of 2, the expense of computer time will increase by a factor of 8. Similarly, if it took one day to solve a boundary value problem in 1D with n = 1000, then it will take 64 days to do n = 4000 (see figure 2.1). Alternatively, if you are doing a 2D simulation, increasing by a factor of 4, as shown in figure 2.2, would increase the domain to 16 and thus the calculations would increase to 163. This gets very expensive! This is one of the reasons that models of phenomena such as the weather is very difficult.

10 2.2. Lecture 3: August 23, 2013 Applied Matrix Theory

y y

yn y4n

y0 x y0 x x0 xn x0 x4n (a) n × n grid (b) 4n × 4n grid

Figure 2.2. Two-dimensional discrete grids.

Validation of Lower/Upper Triangular Form

Consider that we have the Gaussian Elimination with A = LU, where

 1 0 L = . (2.2.9) `ij 1

Check our previous system:

 2 −1 3  1 0 0 2 −1 3 −4 6 −5 = −2 1 0 0 4 1. (2.2.10) 6 13 16 3 4 1 0 0 3

This works!

Theoretical derivation of Lower/Upper Form

We want to show that Gauss elimination naturally leads to the LU form using elementary row operations. The three elementary operations are:

1. Multiply row by α;

2. Switch rowi and rowj;

3. Add multiple of rowi to rowj.

All are equivalent to pre-multiplying A by an elementary matrix. Let’s illustrate these:

11 Nitsche and Benner Unit 2. Matrix Inversion

1. Multiply by α.       1 0 0 0 a11 a12 a13 a1n a11 a12 a13 a1n 0 1 0 ··· 0 a a a ··· a   a a a ··· a     21 22 23 2n  21 22 23 2n  0 0 α 0 a a a a  αa αa αa αa     31 32 33 3n =  31 32 33 3n  . .  . .. .   . .. .   . .  . . .   . . .  0 0 0 ··· 1 an1 an2 an3 ··· ann an1 an2 an3 ··· ann | {z } Ei (2.2.11a)

2.3 Homework Assignment 1: Due Friday, August 30, 2013

1. Use Taylor series expansions of f(x ± h) about x to show that f(x + h) − 2f(x) + f(x − h) h2 f 00(x) = − f (4)(x) + Oh4. (2.3.1) h2 12

2. Consider the two-point boundary value problem 1 y00(x) = ex, y(−1) = , y(1) = e (2.3.2) e where x ∈ [−1, 1], Divide the interval [−1, 1] into N equal subintervals and apply the finite difference method presented in class to find the approximate the solution yj ≈ y(xj) at the N−1 interior points j = 1,...,N−1, where xj = a+jh, h = (b−a)/N, and [a, b] = [−1, 1]. Compare the approximate values at the grid points with the exact solution at the grid points. Use N = 2, 4, 8,..., 29 and report the maximal absolute error for each N in a table. Your writeup should contain:

• the Matlab code; • a table with two columns. The first contains h, the second contains the corre- sponding maximal errors. By how much is the error reduced every time N is doubled? Can you conclude whether the error is O(h), O(h2) or O(hp) for some other integer p?

Regarding Matlab: If needed, go over the Matlab tutorial on the course website, items 1–6. This covers more than you need for this problem. In Matlab, type

help diag or help ones

to find what these commands do. The (N −1)×(N −1) matrix with 2s on the diagonal and –1 on the off-diagonals can be constructed by

v=ones(1,n-1); A=2*diag(v)-diag(v(1:n-2),1)-diag(v(1:n-2),-1);

12 2.3. HW 1: Due August 30, 2013 Applied Matrix Theory

The system Ax = b can be solved in Matlab by x = A\b. The maximal difference between two vectors x and y is error=max(abs(x-y)). Your code should have the following structure

Listing 2.1. code stub for tridiagonal solver

1 disp ( sprintf ( h e r r o r ) 2 a=...; b=...; % Set values of endpoints 3 ya=...; yb=...; % Set values of y at the endpoints 4 for n = . . . ; 5 h=2/n ; 6 x=a : h : b ; 7 % Set matrix A of the linear system to be solved. 8 v=ones(1,n−1); 9 A=2∗diag ( v)−diag ( v ( 1 : n−2),1)−diag ( v ( 1 : n−2) , −1); 10 % Set right hand side of linear system. 11 rhs = . . . 12 % Solve linear system to find approximate solution. 13 y ( 2 : n)=A\ rhs ; y(1)=ya; y(n+1)=yb; 14 % Compute exact solution and approximation error 15 yex = . . . % set exact solution 16 plot ( x , y , b − , x , yex , r − ) % to compare visually 17 error=max( abs ( y−yex ) ) 18 disp ( sprintf ( %15.10f %20.15f ,h,error)) 19 end

Note that in Matlab the index of all vectors starts with 1. Thus, x=-1:h:1, is a vector of length n + 1 and the interior points are x(2:n).

3. Let U be an upper triangular n × n matrix with nonzero entries uij, j ≥ i. (a) Write an algorithm that solves Ux = b for a given right hand side b for the unknown x. (b) Find the number of operations that it takes to solve for x, using your algorithm above. (c) Write a Matlab function function x=utrisol(u,b) that implements your al- gorithm and returns the solution x.

4. Given A, b below,

(a) find the LU factorization of A (using the Gauss Elimination algorithm); (b) use it to solve Ax = b.  2 −1 0 0 0 −1 2 −1 0 0 A =  , b =  . (2.3.3)  0 −1 2 −1 0 0 0 −1 2 5

5. Sparsity of L and U, given sparsity of A = LU. If A, B, C, D have non-zeros in the positions marked by x, which zeros (marked by 0) are still guaranteed to be zero in

13 Nitsche and Benner Unit 2. Matrix Inversion

their factors L and U?(B, C, D are all band matrices with p = 3 bands, but differing sparsity within the bands. The question is how much of this sparsity is preserved.) In each case, highlight the new nonzero entries in L and U.

x 0 x 0 0 0   x x x x 0 x 0 x 0 0   x x x 0 x 0 x 0 x 0 A =   , B =   , 0 x x x 0 x 0 x 0 0   0 0 x x 0 0 x 0 x 0 0 0 0 x 0 x

x x x 0 0 0 x 0 0 x 0 0 0 x 0 x 0 0 0 x 0 0 x 0     x 0 x 0 x 0 x 0 x 0 0 x C =   , D =   , 0 x 0 x 0 0 0 x 0 x 0 0     0 0 x 0 x 0 0 0 x 0 x 0 0 0 0 x 0 x 0 0 0 x 0 x

6. Consider solving a differential equation in a unit cube, using N points to discretize each dimension. That is, you have a total of N 3 points at which you want to approximate the solution. Suppose that at each time step, you need to solve a linear system Ax = b, where A is an N 3 × N 3 matrix, which you solve using Gauss Elimination, and suppose there are no other computations involved. Assume your personal computer runs at 1 GigaFLOPS, that is, it executes 109 floating point operations per second.

(a) How much time does it take to solve your problem for N = 500 for 1000 timesteps? (b) When you double the number of points N, you typically also have to halve the timestep, that is, double the total number of timesteps taken. By what factor does the runtime increase each time you double N? (c) How much time will it take to solve the problem if you use N = 2000?

14 UNIT 3

Factorization

3.1 Lecture 4: August 26, 2013

For the h in the homework, for n = 2.^(1:1:10). We want to deduce the order of the method from the table of h and the error.

Elementary Matrices

1. Multiply rowi by α: 1 0 0 0 0 . 0 .. 0 0 0     E1 = 0 0 α 0 0 . (3.1.1)  .  0 0 0 .. 0 0 0 0 0 1 The inverse is, 1 0 0 0 0 0 ... 0 0 0   E−1 = 0 0 1 0 0 . (3.1.2) 1  α   .  0 0 0 .. 0 0 0 0 0 1 −1 E1E1 = I (3.1.3)

2. Exchange rowi and rowj: 1 0 0 0 0 0 0 1 0 0 0 0   0 0 0 1 0 0 E2 =   . (3.1.4) 0 0 1 0 0 0   0 0 0 0 1 0 0 0 0 0 0 1

15 Nitsche and Benner Unit 3. Factorization

2 E2 = I (3.1.5)

3. Replace rowj by rowj + αrowi.

1 0 0 0 0 0 0 1 0 0 0 0   0 0 1 0 0 0 E3 =   . (3.1.6) 0 0 α 1 0 0   0 0 0 0 1 0 0 0 0 0 0 1

1 0 0 0 0 0 0 1 0 0 0 0   −1 0 0 1 0 0 0 E =   . (3.1.7) 3 0 0 −α 1 0 0   0 0 0 0 1 0 0 0 0 0 0 1 What happens if we post-multiply by the elementary matrices? The matrices will act on the columns instead of the rows.       a11 a12 a13 a1n 1 0 0 0 0 a11 a12 αa13 a1n . a21 a22 a23 ··· a2n 0 .. 0 0 0 a21 a22 αa23 ··· a2n       a a a a    a a αa a  AE1 =  31 32 33 3n 0 0 α 0 0 =  31 32 33 3n  . .. .   .   . .. .   . . .  0 0 0 .. 0  . . .  an1 an2 an3 ··· an 0 0 0 0 1 an1 an2α an3 ··· an (3.1.8)   1 0 0 0 0 0 a11 a12 a13 a1n a a a ··· a  0 1 0 0 0 0  21 22 23 2n   a a a a 0 0 0 1 0 0 AE2 =  31 32 33 3n   (3.1.9)  . . .  0 0 1 0 0 0  . .. .      0 0 0 0 1 0 an1 an2 an3 ··· an 0 0 0 0 0 1

Gaussian Elimination without pivoting Premultiply by elementary matrices type 3 repeatedly.

aji `ji = , for j > i (3.1.10) aii

x x x x x 0 x x x x   E−21A = x x x x x (3.1.11)   x x x x x x x x x x

16 3.1. Lecture 4: August 26, 2013 Applied Matrix Theory

x x x x x 0 x x x x   E−31E−21A = 0 x x x x (3.1.12)   x x x x x x x x x x This sequence continues until we have introduced zeros to get the lower diagonal: x x x x x 0 x x x x   E−n,n−1 ··· E−n1 ··· E−31E−21A = 0 0 x x x = U (3.1.13)   0 0 0 x x 0 0 0 0 x Thus, A = E21E31 ··· En−1,n−2En,n−2En,n−1 U (3.1.14) | {z } L  1 0 0 0 0 0  1 0 0 0 0 0  1 0 0 0 0 0 `21 1 0 0 0 0  0 1 0 0 0 0 `21 1 0 0 0 0        0 0 1 0 0 0 `31 0 1 0 0 0 `31 0 1 0 0 0 E21E31 =     =   . (3.1.15)  0 0 0 1 0 0  0 0 0 1 0 0  0 0 0 1 0 0        0 0 0 0 1 0  0 0 0 0 1 0  0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 Which extends to  1 0 0 0 0 0  `21 1 0 0 0 0    ` 0 1 0 0 0 ˜  31  E1 = En1 ··· E21E31 =  . ..  . (3.1.16)  . 0 0 . 0 0   `n−1,1 0 0 0 1 0 `n1 0 0 0 0 1 This further extends to,  1 0 0 0 0 0  `21 1 0 0 0 0    ` ` 1 0 0 0 ˜ ˜  31 32  E1E2 =  . . ..  . (3.1.17)  . . 0 . 0 0   `n−1,1 `n−1,2 0 0 1 0 `n1 `n2 0 0 0 1 Finally we get that  1 0 0 0 0 0  `21 1 0 0 0 0    ` ` 1 0 0 0 ˜ ˜ ˜  31 32  E1E2 ··· En−1 =  ......  . (3.1.18)  . . . . 0 0   `n−1,1 `n−1,2 ··· `n−1,n−2 1 0 `n1 `n2 ··· `n,n−2 `n,n−1 1

17 Nitsche and Benner Unit 3. Factorization

Solution of Matrix using the Lower/Upper factorization To use A = LU to solve Ax = b.

2 3 1. Find L, U (number of operations: 3 n ) 2. L(Ux) = b First solve Ly = b (number of operations: n2), then solve, Ux = y (number of operations: n2).

Example:

To solve Ax = b_k k= 1,10^4 % Find L, U once O(2/3 n^3) then solve L y = b U x = y 10,000 times O(10,000 * n^2 * 2)

Sparse and Banded Matrices Given x 0 0 0 0 0 ... 0 0 0     A = 0 0 x 0 0 (3.1.19)  .  0 0 0 .. 0 0 0 0 0 x the bandwidth is 1. Below, x x 0 0 0 0 x x x 0 0 0   0 x x x 0 0 A =   , (3.1.20) 0 0 x x x 0   0 0 0 x x x 0 0 0 0 x x the bandwidth is 3—this is a tridiagonal matrix. This type of matrix maintains it’s sparsity when it undergoes LU decomposition.

x x 0 0 0 0 1 0 0 0 0 0 x x 0 0 0 0 x x x 0 0 0 x 1 0 0 0 0 0 x x 0 0 0       0 x x x 0 0 0 x 1 0 0 0 0 0 x x 0 0   =     . (3.1.21) 0 0 x x x 0 0 0 x 1 0 0 0 0 0 x x 0       0 0 0 x x x 0 0 0 x 1 0 0 0 0 0 x x 0 0 0 0 x x 0 0 0 0 x 1 0 0 0 0 0 x

18 3.2. Lecture 5: August 28, 2013 Applied Matrix Theory

Motivation for Gauss Elimination with Pivoting When does Gauss elimination give us a problem? For example

0 1 1. 1 1

δ 1 1 + δ 1 2. A = . Solve Ax = , the exact solution is . However, we run into 1 1 2 1 numerical problems.

3.2 Lecture 5: August 28, 2013

Motivation for Gauss Elimination with Pivoting, cont. When does Gauss elimination give us a problem? Returning to the example problem, A = δ 1 1 + δ 1 . Solve Ax = , the exact solution is , but we run into numerical 1 1 2 1 problems. There are a couple approaches to this problem. First, solve for x by first finding L, U and using them numerically,

δ 1 δ 1  A = → 1 = U (3.2.1) 1 1 0 1 − δ and  1 1 L = δ (3.2.2) 0 1 Now we want to solve L (Ux) = b 1 for j =1:16 2 delta =10ˆ(− j ) ; 3 b=[1+delta, 2]; 4 L= [1, 0; 1/delta, 1]; 5 U=[delta, 1; 0, 1−1/ d e l t a ] ; 6 % Solve Ly = b \ to y 7 y(1) =b(1); y(2) =b(2) − L( 2 , 1 ) ∗ y ( 1 ) ; 8 % Solve Ux = y \ to x 9 x(2) = y(2)/u(2,2); x(1) = (y(1) − u ( 1 , 2 ) ∗ x(2))/u(1,1); 10 % 11 disp ( sprintf (’ %5.0 e %20.15 f %20.15 f %10.8e’,delta ,x(1),x(2),norm( x − [ 1 , 1 ] ) ) ; 12 end

Note that the norm is the Euclidian norm, x − [1, 1] = p(x(1) − 1)2 + (x(2) − 1)2 . This gives us a table of results as shown below Conclusion: Ax = b is a good problem (well-posed) introducing small perturbations (e.g., by roundoff) does not change the solution by much. Matlab’s algorithm A\b is a good algorithm (stable); LU decomposition does not give a good algorithm (unstable).

19 Nitsche and Benner Unit 3. Factorization

Table 3.1. Variation of error with the perturbation

δ x(1) x(2) ||x − [1, 1]||2 1e-01 1.000 1.000 8e-16 1e-02 1.000 1.000 1e-13 1e-03 0.999 1.000 6e-12 1e-04 1.000. . . 28 1.000 e-11 1e-05 . . . 1.000 e-10 ...... 1e-16 0.888 1.000 e-0

Discussion of well-posedness

Geometrically, Ax = b,

δx1 + x2 = 1 + δ, (3.2.3a)

x1 + x2 = 2. (3.2.3b)

This is a well-posed system. Rearranging

x2 ≈ 1 − δx1, x2 = 2 − x1. (3.2.4a)

Our other system Ly = b,

y1 = 1 (3.2.5a) 1 y + y = 2 (3.2.5b) δ 1 2

This makes a very ill-posed system because small wiggles in δ give much larger errors because the slopes are so near each other. Now we consider Ux = y,

δx1 + x2 = 1, (3.2.6a)  1 1 − x = y . (3.2.6b) δ 2 2

This is also ill-posed as well. All of these linear problems are illustrated in figure 3.1.

20 3.2. Lecture 5: August 28, 2013 Applied Matrix Theory

x2 x2 x2

(1, 1)

x1 x1 x1 (a) Ax = b (b) Ly = b (c) Ux = y

Figure 3.1. Plot of linear problems and their solutions.

Gaussian elimination with pivoting

Pivoting means we exchange rows such that the current |aii| = max |aji|. Similarly, `ji = j≥i aji ≤ 1 for all j > i. Now, aii  δ 1 1 + δ   1 1 2  → (3.2.7a) 1 1 2 δ 1 1 + δ   R ←R −δR 1 1 2 −−−−−−−→2 2 1 (3.2.7b) 0 1 − δ 1 + δ − 2δ  1 1 2  → (3.2.7c) 0 1 − δ 1 − δ PLU always works. Theorem: Gaussian elimination with pivoting yields PA = LU. The permutation matrix is P. Every matrix has a PLU factorization. To do the pivoting, at each step, first premultiply A by 1 0 0 0 0 0  ...  0 0 0 0 0   0 0 0 1 0 0 Pk =   (3.2.8) 0 0 1 0 0 0  .  0 0 0 0 .. 0 0 0 0 0 0 1 then premultiply by 1 0 0 0 0 0 .  ..  0 0 0 0 0   0 0 1 1 0 0 Lk =   (3.2.9) 0 0 `k−1,k 1 0 0  . .  0 0 . 0 .. 0 0 0 `n,k 0 0 1

21 Nitsche and Benner Unit 3. Factorization

We do this in succession,

Ln−1Pn−1 ··· L2P2L1P1A = U (3.2.10)

How do these commute into a useful P and L matrix?

3.3 Lecture 6: August 30, 2013

Discussion of HW problem 2

2 − yj−1 + 2yj − yj+1 = h f(xj), for j = 1, . . . , n − 1. (3.3.1)

 2 −1 0 ··· 0     y1 f(t1) + y0  .. . −1 2 −1 . .  y2   f(t2)   .   .  2  .   0 −1 2 .. 0  .  = h  . . (3.3.2)        ......  y   f(t )   . . . . −1  n−2  n−2  0 ··· 0 −1 2 yn−1 f(tn−1) + yn

So we’ve set up our matrix rhs = matrix of zeros size \(1 \times n-1\) for A_{(n-1)x(n-1)} x = a:h:b = linspace(a,b,n+1) rhs = h^2*f(x(2:n)); rhs(1) = rhs(1) + ya; rhs(n-1) = rhs(n-1) + yb;

Recall that our f(x) = −ex: − y00 = −ex (3.3.3)

PLU factorization

For PLU factorization, we are doing Gauss elimination with pivoting. At each kth step (k) of Gaussian elimination, switch rows so that the pivots, akk , are the largest number by magnitude in the kth column. For example,       1 −1 3 x1 −3 −1 0 −2 x2 =  1. (3.3.4) 2 2 4 x3 0

22 3.3. Lecture 6: August 30, 2013 Applied Matrix Theory

or

 1 −1 3 −3   2 2 4 0   −1 0 −2 1  →  −1 0 −2 1  , row1 ↔ row3 (3.3.5a) 2 2 4 0 1 −1 3 −3  2 2 4 0  1 1 → 0 1 −0 1 , row ← row − row , and row ← row − row   2 2 3 1 3 3 2 1 0 −2 1 −3 (3.3.5b)  2 2 4 0  →  0 −2 1 −3  , row2 ↔ row3 (3.3.5c) 0 1 −0 1  2 2 4 0   1 → 0 −2 1 −3 , row ← row − − row   3 3 2 2 0 0 1/2 −1/2 (3.3.5d)

We need to do the back substitution to solve this system. But more importantly, we want to know what the factorization of this system would be. Recall,

1 0 0 0 0 0 .  ..  0 0 0 0 0   0 0 1 0 0 0 Lk =   , (3.3.6) 0 0 `k−1,k 1 0 0  . .  0 0 . 0 .. 0 0 0 `n,k 0 0 1 and

L−(n−1)Pn−1 ··· L−2P2L−1P1A = U. (3.3.7)

Reordering,

Pn−1 ··· L−2P2L−1P1A = L(n−1)U. (3.3.8)

We want to move each P to be right next to A and all the Ls such that we can form a true L. Claim, ˜ PjL−k = L−kPj, j > k. (3.3.9)

th Pj permutation moves columns below the k row. This allows us to move L’s out.

˜ PjL−kPj = L−k (3.3.10a)

˜ ˜ L−n ··· L−1Pn−1 ··· P1A = U (3.3.11)

23 Nitsche and Benner Unit 3. Factorization

Now we can return to our example but with keeping track of the  1 −1 3 −3   2 2 4 0  0 0 1  −1 0 −2 1  →  −1 0 −2 1  , row1 ↔ row3, P1 = 0 1 0 2 2 4 0 1 −1 3 −3 1 0 0 (3.3.12a)  2 2 4 0     − 1 1 −0 1  1 1 →  2  , row2 ← row2 − − row1, row3 ← row3 − row1   2 2 1 2 −2 1 −3 (3.3.12b)  2 2 4 0  0 0 1  1 −2 1 −3  →  2  , row2 ↔ row3, P2 = 1 0 0  1  0 1 0 − 2 1 −0 1 (3.3.12c)  2 2 4 0     1 −2 1 −3  1 →  2  , row3 ← row3 − − row2  1 1  2 − 2 − 2 1/2 −1/2 (3.3.12d) Because P = P−1, we should remember that, PA = LU (3.3.13a) A = PLU. (3.3.13b)

3.4 Lecture 7: September 4, 2013

PLU Factorization Recall PA = LU (3.4.1) always exists by construction. This is because we can make anything non-zero by the per- mutation. This is also equivalent to, A = PLU (3.4.2) because P = P−1. To use this in an actual solution, PAx = Pb, (3.4.3) or LUx = Pb, (3.4.4) So this system is determined by:

24 3.4. Lecture 7: September 4, 2013 Applied Matrix Theory

1. Solving Ly = Pb,

2. Solving Ux = y.

In Matlab, we would use the commands [L,U,P] = lu(A), to find these three matrices. This factorization is not unique. We want to show the uniqueness of the LU factorization, and are also interested in when it exists.

Triangular Matrices We are interested in the of lower or upper triangular matrices. Let’s discuss det(L).   `11 0 0 0 0 . .  . ..   . 0 0 0    L = `i1 ··· `jj 0 0  (3.4.5)  . . .   . ··· . .. 0  `n1 ··· `nj . . . `nn Qn the determinant is det(L) = i=1 `ii. Thus L is invertible only if `ii 6= 0 for all `ii. We conjecture the product of two lower triangular matrices will give us lower a triangular matrix. e.g. L1L2 = L12 (3.4.6) We want to prove this!

Multiplication of lower triangular matrices

Prove that L1L2 is lower triangular. Assume AB are lower triangular. Show C = AB is lower triangular. We know that bijaij = 0 for j > i. In our proof, we first consider matrix multiplication. X eij = aikbkj. (3.4.7)

We know that aik = 0 for k > i, and bkj = 0 for j > k. If j > i, then when k < i we have that k < j so bkj = 0. Alternatively, if k > i then aik = 0. Thus, in either case one of the two products is zero and we have proved our hypothesis.

Inverse of a lower triangular matrix A lower triangular matrix’s inverse is also a lower triangular matrix;   `11 ··· 0 −1  . .. .  L =  . . .  = Lower triangular (3.4.8) `n1 ··· `nn

So, this helps with inversion of the form,

L−n ··· L−2L−1A = U. (3.4.9)

25 Nitsche and Benner Unit 3. Factorization

For matrixes of the form 1 0 0 0 0 0 .  ..  0 0 0 0 0   0 0 1 0 0 0 L−k =   ; (3.4.10) 0 0 −`ij 1 0 0  . .  0 0 . 0 .. 0 0 0 −`nj 0 0 1 the inverse matrix is 1 0 0 0 0 0 .  ..  0 0 0 0 0   0 0 1 0 0 0 Lk =   . (3.4.11) 0 0 `ij 1 0 0  . .  0 0 . 0 .. 0 0 0 `nj 0 0 1 For any     1 0 0 0 0 `11 0 0 0 0 . . . . . ..   . ..  . 0 0 0  . 0 0 0      Lk = 0 ··· 1 0 0  0 ··· `ii 0 0  (3.4.12) . . .   . . .  . 0 . .. 0  . 0 . .. 0  0 ··· `in ... 1 0 ··· 0 . . . `nn To find L−1,[LI] −−→GE [IL−1]. Use Gaussian elimination on L, and we go through each column.

Uniqueness of LU factorization

Theorem: If A is such that no non-zero pivots are encountered, then A = LU with `ii = 1 aij and uii 6= 0, which are the pivots. For, `ij = for j < i by construction. aii Proof: Assume A = L1U1 = L2U2, then −1 L2 L1U1 = U2, (3.4.13a) −1 −1 L2 L1 = U2U1 (3.4.13b) = diagonal matrix (3.4.13c) = I. (3.4.13d)

−1 If this is the case, then L2 L1 = I or L2 = L1, and similarly U2 = U1. Thus these matrices are the same and the solution must be unique.

Existence of the LU factorization

Theorem: A = LU with no zero pivots, then all leading principal submatrices Ak are non- singular. We define the leading principle sub matrices Ak of An×n is Ak = A(1:k),(1:k). These are the upper-left square matrices of the full matrix.

26 3.5. Lecture 8: September 6, 2013 Applied Matrix Theory

Part 2. A = LU then define Ak 6= 0 for any k. We want to prove that if A = LU, show that Ak is invertible. Then if Ak is invertible show that A = LU.

3.5 Lecture 8: September 6, 2013

About Homeworks

The median score was 50 out of 60. A histogram was shown with the general grade distri- bution. 1 around 10, 3 around 25, 1 around 40, 4 from 45–50, 4 from 50–55, 6 from 55–60. Comments: write in working Matlab code. Also, L must have ones on the diagonal, while U has pivots on the diagonal. “Computing efficiently” means using the LU decomposition, not invert the matrix A. For homework 2, we will have applications of finding the inverse of A or solve

AX = I (3.5.1)

or   A x1 x2 ··· xn = e1 e2 ··· en (3.5.2)

To find A−1, solve

Axj = ej, (3.5.3)

for all j = 1, 2, . . . , n. Use the LU decomposition.

Discussion of ill-conditioned systems

We define Ax = b as an ill-conditioned system if small changes in A or b introduces large changes in the solution. Geometrically we showed this interpretation previously on a 2 × 2 system, and we noted that the slopes were very similar to each-other. Numerically, we have trouble because the roundoff when we solve Ax˜ = b. We also may compute a condition number which tells us the amplification factor of errors in the system. In Matlab, the command cond(A) gives you the condition. This should hopefully be under a thousand. The condition number essentially tells you how much accuracy you can expect to get from the final solution. In other words, if your condition number is 1 × 105 then you can only expect to have about 11 significant digits in our solution at floating point arithmetic.

27 Nitsche and Benner Unit 3. Factorization

Inversion of lower triangular matrices

Show that if A is a lower triangular matrix then so is A−1. So let’s solve AX = I with A lower triangular.

 x 0 0 0 0 1 0 0 0 0   x 0 0 0 0 1 0 0 0 0   x x 0 0 0 0 1 0 0 0   x x 0 0 0 y 1 0 0 0       x x x 0 0 0 0 1 0 0  →  x x x 0 0 y 0 1 0 0  , (3.5.4a)      x x x x 0 0 0 0 1 0   x x x x 0 y 0 0 1 0  x x x x x 0 0 0 0 1 x x x x x y 0 0 0 1  x 0 0 0 0 1 0 0 0 0   x x 0 0 0 y 1 0 0 0    →  x x x 0 0 y y 1 0 0  , (3.5.4b)    x x x x 0 y y 0 1 0  x x x x x y y 0 0 1  x 0 0 0 0 1 0 0 0 0   x x 0 0 0 y 1 0 0 0    →  x x x 0 0 y y 1 0 0  , (3.5.4c)    x x x x 0 y y y 1 0  x x x x x y y y 0 1  x 0 0 0 0 1 0 0 0 0   x x 0 0 0 y 1 0 0 0    →  x x x 0 0 y y 1 0 0  . (3.5.4d)    x x x x 0 y y y 1 0  x x x x x y y y y 1

We now have shown that we can get the lower triangular matrix A into the form LD. Now we do backward substitution to get our X. In this case this is simply deviding each row by the value of the pivot of that row. In this way with D = U, we have X = D−1L−1.

Example of LU decomposition of a lower triangular matrix

Given the matrix,

2 0 0 1 0 0 2 0 0 1 1 3 0 =  2 1 0 0 3 0 , (3.5.5a) 1 2 1 4 1 3 1 0 0 4 = LU. (3.5.5b)

28 3.6. Lecture 9: September 9, 2013 Applied Matrix Theory

Banded matrix example

Exercise 3.10.7: Band matrix A with bandwidth w is a matrix with aij = 0 if |i − j| > w. If w = 0, we have a diagonal matrix.   a11 0 0 0 0  0 a22 0 0 0    Aw=0 =  0 0 a33 0 0  . (3.5.6)    0 0 0 a44 0  0 0 0 0 a55

For bandwidth, w = 1,   a11 a12 0 0 0 a21 a22 a23 0 0    Aw=1 =  0 a32 a33 a34 0  . (3.5.7)    0 0 a43 a44 a45 0 0 0 a54 a55 For bandwidth, w = 2,   a11 a12 a13 0 0 a21 a22 a23 a24 0    Aw=2 = a31 a32 a33 a34 a35 . (3.5.8)    0 a42 a43 a44 a45 0 0 a53 a54 a55 In the LU decomposition these zeros are preserved. However there are other cases (as shown in the homework) where the zeros may not be preserved. We will return to our theorem on Monday. For the homework, a matrix has an LU decomposition if and only if all principle submatrices are invertible.

3.6 Lecture 9: September 9, 2013

Existence of the LU factorization (cont.) When does LU factorization exist? Theorem: If no zero pivots that appears in Gaussian th elimination (including the n one) then A = LU, `ii = 1 and uii 6= 0 are pivots. Then L, U are unique. Theorem: A = LU if and only if the leading principle submatrices Ak is invertible. Proof: Assume (for block matrices of length k × k, n − k × n − k and the difference)

A = LU, (3.6.1) L 0  U U  = 11 11 12 , (3.6.2) L21 L22 0 U22 L U L U  = 11 11 11 12 (3.6.3) L21U11 L22U22

29 Nitsche and Benner Unit 3. Factorization

Qk Now our question: is Ak = L11U11? We know that det L11 = j=1 `jj 6= 0 so L11 is Qk invertible. Similarly, U11 = j=1 ujj 6= 0 so it is also invertibles. Since we know that the product of two invertible matrices is also invertible, Ak must also be invertible. We will now do a proof by induction: If we assume that all Ak are invertible. Show that A = LU.

ASIDE: Example of proof by induction. We want to show, n X n(n + 1)(2n + 1) j2 = . (3.6.4) 6 j=1 The steps of proof by induction are

1. First we show that this holds for n = 1, 2. next we assume it holds for n, 3. finally we show that it holds for n + 1.

Let’s show the third step,

n+1 n X X j2 = j2 + (n + 1)2, (3.6.5a) j=1 j=1 n(n + 1)(2n + 1) = + (n + 1)2, (3.6.5b) 6 n(n + 1)(2n + 1) + 6(n + 1)2 = , (3.6.5c) 6 (n + 1) [n(2n + 1) + 6(n + 1)] = , (3.6.5d) 6 (n + 1) 2n2 + 7n + 1 = , (3.6.5e) 6 (n + 1)(n + 2)(2n + 3) = . (3.6.5f) 6 Which is what would be expected, and we have proved this relation by induction.

So for our system, 1. First we show that this holds for n = 1,

A = [a11] = [1] [a11] where a11 6= 0. 2. Assume true for n:

If Ak, k = 1, . . . , n are invertible, then An×n = Ln×nUn×n. 3. Show it holds for n + 1.

So let’s move onto the third step, assume A(n+1)×(n+1) with Ak, k = 1, . . . , n+1 are invertible. By induction assumption An = LnUn, since A1,..., An are invertible. Now we need to show that An+1 = Ln+1Un+1, A b A = n , (3.6.6a) n+1 c| α L U b = n n , (3.6.6b) c| α L 0U x = n n . (3.6.6c) y| 1 0| β

30 3.6. Lecture 9: September 9, 2013 Applied Matrix Theory

−1 −1 We want Lnx = b so we let x = Ln b which supposes that Ln exists. We also want | | | | −1 | | y Un = c so we let y = c Un . Finally, we want y x + β = α, so we let β = α − y x. We know,

L U b A = n n , (3.6.7a) n+1 c| α   −1  Ln 0 Un Ln b = | −1 | | −1 −1 . (3.6.7b) c Un 1 0 α − c Un Ln b

Since A = An+1 is invertible, we must have β 6= 0 because if β = 0 then det(Ln+1) det(Un+1) = 0, in which case An+1 would not be invertible. So, An+1 has an LU decomposition and by principle of induction we have proven our theorem.

Rectangular matrices

m×n For a rectangular matrix Am×n ∈ R . Our question: is Ax = b solvable? Is the solution unique? We are presented with there options: no solution, unique solution, or infinitely many solutions. We are going to do Gaussian elimination to reduce the form of the matrix to see how many solutions we will have. So we will do (REF) reduction.

Example of row echelon form

1 2 1 3 3 2 4 0 4 4 A =  , (3.6.8a) 1 2 3 5 5 2 4 0 4 7 1 2 1 3 3 0 0 −2 −2 −2 →  , (3.6.8b) 0 0 2 2 2 0 0 −2 −1 1 1 2 1 3 3 0 0 1 1 1 →  , (3.6.8c) 0 0 0 0 0 0 0 0 0 2 1 2 1 3 3 0 0 1 1 1 →  . (3.6.8d) 0 0 0 0 1 0 0 0 0 0

Where we made interchanges to have leading ones for the columns. What do we know about our matrix A from this information? First, we know what columns are linearly independent. We are trying to find the column space of our matrix.

31 Nitsche and Benner Unit 3. Factorization

3.7 Homework Assignment 2: Due Friday, September 13, 2013

1. Textbook 3.10.1 (a, c): LU and PLU factorizations 1 4 5 Let, A = 4 18 26. 3 16 30

(a) Determine the LU factors of A

(c) Use the LU factors to determine A−1

2. Textbook 3.10.2 Let A and b be the matrices,

1 2 4 17 17 3 6 −12 3  3 A =   and b =  . 2 3 −3 2  3 0 2 −2 6 4

(a) Explain why A does not have an LU factorization. (b) Use partial pivoting and find the permutation matrix P as well as the LU factors such that PA = LU. (c) Use the information in P, L, and U to solve Ax = b.

3. Textbook 3.10.3 ξ 2 0 Determine all values of ξ for which A = 1 ξ 1 fails to have an LU factorization. 0 1 ξ 4. Textbook 3.10.5 If A is a matrix that contains only integer entries and all of its pivots are 1, explain why A−1 must also be an integer matrix. Note: This fact can be used to construct random integer matrices that posses integer inverses by randomly generating integer matrices L and U with unit diagonals and then constructing the product A = LU.

5. Lower triangular matrices Let A be a 3 × 3 matrix with real entries. We showed that GE is equivalent to finding lower triangular matrices L−1 and L−2 such that L−2L−1A = U where U is upper triangular and,

 1 0 0 1 0 0 L−1 = −`21 1 0 , L−2 = 0 1 0 , (3.7.1) −`31 0 1 0 −`32 1

32 3.7. HW 2: Due September 13, 2013 Applied Matrix Theory

with  1 0 0 1 0 0 −1 −1 (L−1) = `21 1 0 = L1, (L−2) = 0 1 0 = L2. (3.7.2) `31 0 1 0 `32 1

It follows that A = L2L1U. Show that

 1 0 0 L2L1 = `21 1 0 . (3.7.3) `31 `32 1

Show by example that generally,

L2L1 6= L1L2 (3.7.4)

That is, the order in which these lower triangular matrices are multiplied matters.

6. Textbook 1.6.4: Conditioning Using geometric considerations, rank the following three systems according to their condition.

(a)

1.001x − y = 0.235, x + 0.0001y = 0.765.

(b)

1.001x − y = 0.235, x + 0.9999y = 0.765.

(c)

1.001x + y = 0.235, x + 0.9999y = 0.765.

7. Textbook 1.6.5 Determine the exact solution of the following system:

8x + 5y + 2z = 15, 21x + 19y + 16z = 56, 39x + 48y + 53z = 140.

Now change 15 to 14 in the first equation and again solve the system with exact arithmetic. Is the system ill-conditioned?

33 Nitsche and Benner Unit 3. Factorization

8. Textbook 1.6.6 Show that the system

v − w − x − y − z = 0, w − x − y − z = 0, x − y − z = 0, y − z = 0, z = 1,

is ill-conditioned by considering the following perturbed system:

v − w − x − y − z = 0, 1 − v + w − x − y − z = 0, 15 1 − v + x − y − z = 0, 15 1 − v + y − z = 0, 15 1 − v + z = 1. 15

34 UNIT 4

Rectangular Matrices

4.1 Lecture 10: September 11, 2013

Rectangular matrices (cont.)

We are interested in a rectangular matrix, Am×n. We may apply REF, or RREF to find the column dependence, what the basic columns are, and what the rank of the matrix is. This way we can find for any system Ax = b, whether the system is consistent and find all the solutions; whether it is homogeneous, or what the free variables are; and what the particular solutions are. Last time’s example, we went from

1 2 1 3 3 2 4 0 4 4 A =  , (4.1.1a) 1 2 3 5 5 2 4 0 4 7 1 2 1 3 3 0 0 2 2 2 →  . (4.1.1b) 0 0 0 0 3 0 0 0 0 0

The first, third, and fifth columns have pivots and are the basic columns. They correspond to the linearly independent columns in A. How do we write the other two columns (c2, c4) as functions of the other three columns? We can notice that, c2 = 2c1, and similarly c4 = 2c1 + c3. The reduced row echelon form (RREF) has pivots on 1, and zeros below and above

35 Nitsche and Benner Unit 4. Rectangular Matrices

x2 x2 x2

x1 x1 x1 (a) Intersecting system (one (b) Parallel system (no solu- (c) Equivalent system (infi- solution) tion) nite solutions)

Figure 4.1. Geometric illustration of linear systems and their solutions.

all pivots. So,

1 2 1 3 3 1 2 1 3 3 0 0 2 2 2 0 0 1 1 1   →  , (4.1.2a) 0 0 0 0 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 2 1 3 0 0 0 1 1 0 →  , (4.1.2b) 0 0 0 0 1 0 0 0 0 0 1 2 0 2 0 0 0 1 1 0 →  . (4.1.2c) 0 0 0 0 1 0 0 0 0 0

In this form, the basic columns are very clear, and the relations between the dependent columns and the basic columns is also obvious. So again we can see that, c2 = 2c1 and c4 = 2c1 + 1c3. The rank of the matrix is the number of linearly independent columns, which is also the number of linearly independent rows, and also the number of pivots in row-echelon form of the matrix. A consistent system, Ax = b is a system that has at least one solution. It is inconsistent if it has no solutions. To determine if Ax = b is consistent, in a 2 × 2 system, Ax = b,

a11x1 + a12x2 = b1, (4.1.3a)

a21x1 + a22x2 = b2. (4.1.3b)

Since this system is a linear system we can see three cases: one intersection, parallel and separated, and parallel and the same. Each of these cases are illustrated in Figure 4.1. In general, for any size matrix, we find the row echelon form of the augmented system

36 4.1. Lecture 10: September 11, 2013 Applied Matrix Theory

h i [A b] → E b˜ . x x x x x 0 x x x x (4.1.4) 0 0 0 0 α If α 6= 0, then the system is inconsistent. So Ax = b is consistent if rank([A b]) = rank(A). If α = 0 then b˜ is not a basic column of (A b). The we can write b˜ as a of the basic columns of E. We can write b as linear combinations of basic columns of A. In our example, we had c1, c3, and c5 where the basic columns and Ax = b was consistent. Here then if we were to preform a reduction, the b = x1c1 + x3c3 + x5c5, or in other words,   x1  0    A x3 = b. (4.1.5)    0  x5

Example of RREF of a Rectangular Matrix Given the matrix,  1 1 2 2 1 1   1 1 2 2 1 1   2 2 4 4 3 1   0 0 0 0 1 −1    →  , (4.1.6a)  2 2 4 4 2 2   0 0 0 0 0 0  3 5 8 6 5 3 0 2 2 0 2 0  1 1 2 2 1 1   0 2 2 0 2 0  →  . (4.1.6b)  0 0 0 0 1 −1  0 0 0 0 0 0 Thus, our system is consistent. We have that rank([A b]) = rank(A). Similarly, we observe that we have 3 basic columns, r, and 2 linearly dependent columns, n − r. (If n > m, then n > r, so n − r 6= 0). Let’s continue on to perform the reduced row echelon form.  1 1 2 2 1 1   1 1 2 2 1 1   0 2 2 0 2 0   0 1 1 0 1 0    →  , (4.1.7a)  0 0 0 0 1 −1   0 0 0 0 1 −1  0 0 0 0 0 0 0 0 0 0 0 0  1 1 2 2 0 2   0 1 1 0 0 1  →  , (4.1.7b)  0 0 0 0 1 −1  0 0 0 0 0 0  1 0 1 2 0 1   0 1 1 0 0 1  →  . (4.1.7c)  0 0 0 0 1 −1  0 0 0 0 0 0

37 Nitsche and Benner Unit 4. Rectangular Matrices

˜ Thus our b = 1˜c1 + 1˜c2 − 1˜c5. Therefore, b = 1c1 + 1c2 − 1c5, and  1  1   x =  0 . (4.1.8)    0 −1

So in review,  1 1 2 2 1 1   1 0 1 2 0 1   2 2 4 4 3 1   0 1 1 0 0 1    →  . (4.1.9)  2 2 4 4 2 2   0 0 0 0 1 −1  3 5 8 6 5 3 0 0 0 0 0 0

| We found a particular solution, xp = (1 1 0 0 − 1) of Ax = b. For any solution xh of Ax = 0, we have that A (xp + xH ) = b + 0. So (xp + xH ) also solves Ax = b.

4.2 Lecture 11: September 13, 2013

Solving Ax = b Ax = b is consistent if rank[A | b] = rank(A). We have that b is a nonbasic column of [A | b]. We can express b in terms of columns of A to get a solution Axp = b. The set of all solutions is xp + xH , where Axp = b has the particular solution to Ax = b. We also solve AxH = 0, and get all homogeneous solutions, xH . Since we can add these two solutions, we have A (xp + xH ) = b. Now to actually find the particular solution, xp, we write b in terms of basic columns. To find the homogeneous solutions, xH , we solve Ax = 0 by solving for basic variables xi in terms of the n − r free variables. Basic variables correspond to basic columns, while free variables correspond to nonbasic columns. Note that if n > r then the set of columns is linearly independent and we can find x 6= 0 such that Ax = 0.

Example From our example  1 1 2 2 1 1   1 0 1 2 0 1   2 2 4 4 3 1   0 1 1 0 0 1    →  , (4.2.1a)  2 2 4 4 2 2   0 0 0 0 1 −1  3 5 8 6 5 3 0 0 0 0 0 0

we have that

b = a:1 + a:2 − a:5, (4.2.2a)

= x1a:1 + x2a:2 − x5a:5, (4.2.2b) | = Axp, where xp = 1 1 0 0 −1 . (4.2.2c)

38 4.2. Lecture 11: September 13, 2013 Applied Matrix Theory

Solve,

 1 0 1 2 0 0   0 1 1 0 0 0  [A | 0] =  . (4.2.3a)  0 0 0 0 1 0  0 0 0 0 0 0

This gives us the three equations for the homogeneous solutions,

x1 = −x3 − 2x4, (4.2.4a)

x2 = −x3, (4.2.4b)

x5 = 0. (4.2.4c)

This gives us the homogeneous solutions of the form,   −x3 − 2x4  −x3    xH =  x3  , (4.2.5a)    x4  0 −1 −2 −1  0     = x3  0 + x4  0. (4.2.5b)      0  1 0 0

Thus the set of all solutions are,

x = xp + xH , (4.2.6a)  1 −1 −2  1 −1  0       =  0 + x3  0 + x4  0 . (4.2.6b)        0  0  1 −1 0 0

This solves Ax = b for any x3 and x4. Therefore we have infinitely many solutions. Not we can only have unique solutions if n = r.

Linear functions

We have any function f : D → R is a linear function if

1. f(x + y) = f(x) + f(y), 2. f(αx) = αf(x).

39 Nitsche and Benner Unit 4. Rectangular Matrices

For example, f(x) = ax + b, with b 6= 0. f(x + y) = (ax + b) + (ay + b) , (4.2.7a) = a(x + y) + 2b, (4.2.7b) 6= a(x + y) + b. (4.2.7c) Thus this is not a linear function. However when b = 0, the function f(x) = ax can be verified to be linear.

Example: Transpose operator

| | The transpose operator is f(A) = A . Define that if A = [aij], then A = [aji] and ∗ | A = A = [¯aji]. Is this linear? f(A + B) = (A + B)| , (4.2.8a) | = [aij + bij] , (4.2.8b)

= [aji + bji] , (4.2.8c) = A| + B|. (4.2.8d) To check the second criterion, f(αA) = [αA]| , (4.2.9a) = α [A]| , (4.2.9b) = αf(A). (4.2.9c) So this operator is linear.

Example: trace operator P The trace operator is f(A) = tr(A) = i aii. X f(A + B) = (aii + bii) , (4.2.10a) i X X = aii + bii, (4.2.10b) i i = tr(A) + tr(B). (4.2.10c) The second cirterion, f(αA) = tr(αA), (4.2.11a) X = αaii, (4.2.11b) i X = α aii, (4.2.11c) i = α tr(A), (4.2.11d) = αf(A). (4.2.11e) We have therefore shown that this is a linear operator.

40 4.2. Lecture 11: September 13, 2013 Applied Matrix Theory

Matrix multiplication Given, a b a˜ ˜b A = , B = . (4.2.12) c d c˜ d˜ Then consider    ˜  ax1 + bx2 ax˜ 1 + bx2 f(x) = Ax = , g(x) = Bx = ˜ . (4.2.13) cx1 + dx2 cx˜ 1 + dx2

Take f(g(x)) = A (Bx) ≡ ABx. (4.2.14) But,

 ˜ ˜  a(˜ax1 + bx2) + b(˜cx1 + dx2) f(g(x)) = ˜ ˜ , (4.2.15a) c(˜ax1 + bx2) + d(˜cx1 + dx2)  ˜ ˜  (aa˜ + bc˜)x1 + (ab + bd)x2 = ˜ ˜ , (4.2.15b) (ca˜ + dc˜)x1 + (cb + dd)x2 aa˜ + bc˜ a˜b + bd˜ x  = 1 , (4.2.15c) ca˜ + dc˜ c˜b + dd˜ x2 ≡ AB. (4.2.15d)

Pn Now if we define AB = [Ai:B:j] or Ai:B:j = k=1 AikBkj. We get that matrix multiplication | {z } (AB)ij is not generally commutative, or AB 6= BA. If AB = 0 then either A = 0 or B = 0 unless A or B are invertible. Further we know that we have the distributive properties,

A (B + C) = AB + AC, (4.2.16) or (A + B) D = AD + BD, (4.2.17) and the associative property (AB) C = A (BC) . (4.2.18) A property of the transpose operator is,

(AB)| = B|A|, (4.2.19) which also helps to understand that,

tr(AB) = tr(BA). (4.2.20)

Note, however, that tr(ABC) 6= tr(ACB) as we will demonstrate on the homework.

41 Nitsche and Benner Unit 4. Rectangular Matrices

Proof of transposition property We want to prove the useful property,

(AB)| = B|A|. (4.2.21)

Dealing with our left hand side of the equation,

|  |  LHS : (AB) = (AB)ij , (4.2.22a)

= [(AB)ji], (4.2.22b)

= [Aj:B:i]. (4.2.22c)

Manipulating the right hand side of the property,

| | h | | i RHS : B A = B A ij , (4.2.23a)  | |  = Bi:A:j , (4.2.23b)

= [B:iAj:], (4.2.23c)

= [Aj:B:i], (4.2.23d) = LHS. (4.2.23e)

Thus, we have proved the identity.

4.3 Lecture 12: September 16, 2013

We will be having an exam on September 30th.

Inverses We define: A has an inverse if each A−1 exists such that,

AA−1 = A−1A = I. (4.3.1)

We also have the properties:

−1 • (AB) = B−1A−1,

−1 | • (A|) = (A−1) , −1 • (A−1) = A. What about the inverse of sums (A + B)−1? There are the special cases,

| −1 n×k • low rank perturbations of In×n :(I + CD ) , where C, D ∈ R or the matrices are of rank k.

−1 • small perturbation of I :(I + A) , where ||A||.

42 4.3. Lecture 12: September 16, 2013 Applied Matrix Theory

We have a rank-1 matrix uv|, with u, v ∈ Rn = Rn×1.   u1 u  |  2  uv =  .  v1 v2 ··· vk , (4.3.2a)  .  uk   u1v1 u1v2 ··· u1vk u v u v ··· u v   2 1 2 2 2 k =  . . .. . , (4.3.2b)  . . . .  ukv1 ukv2 ··· ukvk  | u1v u v|  2  =  . . (4.3.2c)  .  | ukv

Now let’s say we have an example where all matrix entries are zero except for αij at some point (i, j).

0 ··· 0 0 .    .     .  . .    . α . = α 0 ··· 1 ··· 0 , (4.3.3a)    .     .  0 ··· 0 0 | = αeiej . (4.3.3b)

Low rank perturbations of I We make the claim the if u, v are such that v|u + 1 6= 0 then

uv| I + uv|−1 = I − (4.3.4) 1 + v|u Proof:  uv|  uv| u (v|u) v| I + uv| I − = I − + uv| − , (4.3.5a) 1 + v|u 1 + v|u 1 + v|u uv| (v|u) = I − + uv| − uv|, (4.3.5b) 1 + v|u 1 + v|u  1 (v|u)  = I − uv| + 1 − , (4.3.5c) 1 + v|u 1 + v|u  |  | 1 +vu = I − uv 1 − , (4.3.5d)  1 + v|u = I. (4.3.5e)

43 Nitsche and Benner Unit 4. Rectangular Matrices

| So if c, d ∈ Rn such that d (A−1c) + 1 6= 0, we are interested in A−1. A + cd|−1 = A I + A−1cd| , (4.3.6a) = I + A−1c d| A−1, (4.3.6b)  A−1cd|  = I − A−1, (4.3.6c) 1 + d|A−1c A−1cd|A−1 = A−1 − . (4.3.6d) 1 + d|A−1c

The Sherman–Morrison Formula

The Sherman–Morrison formula states that if A is invertible and C, D ∈ Rn×k such that I + D|A−1C is invertible. Then,

A + CD|−1 = A−1 − A−1C I + D|A−1C−1 D|A−1 (4.3.7)

Finite difference example with periodic boundary conditions Previously, we had,

−y00 = f, on [a, b], (4.3.8a)

y(a) = ya, (4.3.8b)

y(b) = yb. (4.3.8c) We get the finite difference approximation of,         2 −1 0 0 y1 f1 y0 −1 2 −1 ··· 0 . .    y2   .   .   .    2      ..   y3       0 −1 2 0   = h  fi  +  0  . (4.3.9)  . . . .  .   .   .   ......   .   .   .  0 0 0 ··· 2 yn−1 fn−1 yn If we instead use periodic boundary conditions we have perturbed our solution,

−y00 = f, on [a, b], (4.3.10a) y(a) = y(b), (4.3.10b) y0(a) = y0(b). (4.3.10c)       2 −1 0 −1 y1 f1 −1 2 −1 ··· 0 .    y2   .   .    2    ..   y3     0 −1 2 0   = h  fi  . (4.3.11)  . . . .  .   .   ......   .   .  −1 0 0 ··· 2 yn−1 fn−1 In this case the Shermann–Morrison formula would help greatly with our inversion.

44 4.3. Lecture 12: September 16, 2013 Applied Matrix Theory

Examples of perturbation Given a matrix 1 2 A = , (4.3.12a) 1 3  3 −2 A−1 = . (4.3.12b) −1 1 1 2 B = , (4.3.12c) 2 3 0 0 = A + , (4.3.12d) 1 0 | = A + e2e1, (4.3.12e)

Applying the Shermann–Morrison formula

0 0 A−1 A−1 −1 −1 1 0 B = A − | −1 , (4.3.12f) 1 + e1A e2  3 −2 0 0 −1 1 3 −2 = A−1 − , (4.3.12g) 1 − 2 −6 4 = A−1 − , (4.3.12h) 3 −2  9 −2 = . (4.3.12i) −4 3

Small perturbations of I We want to show what happens when we have small perturbations from the I; (I − A)−1 =? I + A + A2 + ··· , (4.3.13) when kAk < 1. We first consider the geometric series, 1 (1 − x)−1 = , (4.3.14a) 1 − x ∞ X = xn, (4.3.14b) n=0 = 1 + x + x2 + x3 + ··· (4.3.14c)

when |x| < 1. To be continued. . .

45 Nitsche and Benner Unit 4. Rectangular Matrices

4.4 Lecture 13: September 18, 2013

Small perturbations of I (cont.) We want to show what happens when we have small perturbations from the identity matrix I; (I − A)−1 =? I + A + A2 + ··· , (4.4.1) when kAk < 1. We first consider the geometric series, 1 (1 − x)−1 = , (4.4.2a) 1 − x ∞ X = xn, (4.4.2b) n=0 = 1 + x + x2 + x3 + ··· , (4.4.2c) when |x| < 1. This is proved as follows,

n X S = xk, (4.4.3a) k=0 S − xS = 1 + x + x2 + ··· + xn − x − x2 − · · · − xn+1, (4.4.3b)    2 2 n n n+1 = 1 + (x− x) + x− x + ··· + (x− x ) − x , (4.4.3c) = 1 − xn+1, (4.4.3d) 1 − xn−1 S = , (4.4.3e) 1 − x 1 − xn−1 = lim , (4.4.3f) n→∞ 1 − x 1 = . (4.4.3g) 1 − x Returning to the full series for a matrix,

(I − A)(I + A + ··· + An) = I + A + A2 + ··· + An − A − A2 − · · · − An+1, (4.4.4a)     2 2 n n n+1 = I + (A− A) + A− A + ··· + (A − A ) − A , (4.4.4b) = I − An+1. (4.4.4c)

If A is small, so that An → 0 as n → ∞, then

∞ X (I − A) Ak = I, (4.4.4d) k=0 ∞ −1 X (I − A) = Ak. (4.4.4e) k=0

46 4.4. Lecture 13: September 18, 2013 Applied Matrix Theory

Let’s consider the convergence of this series now.

∞ X L = ak, (4.4.5) k=1 Pn where ak → 0 as k → ∞. We define that L is finite if limn→∞ k=1 ak exists and is finite. P∞ 1 Pn 1 As an example we see that n=1 n diverges since limn→∞ k=1 k → ∞. So we also should consider that the difference,

∞ X (L−) − ak → 0, as n → ∞. (4.4.6) k=1 Thus, we can consider that,

n X L ≈ ak, with error → 0 as n → ∞. (4.4.7) k=1 In particular if A is small then,

(I − A)−1 ≈ I + A. (4.4.8)

For example,

(A + B)−1 = A I + A−1B−1 , (4.4.9a)

where A−1 exists,

= I + A−1B−1 A−1, (4.4.9b) ≈ I − A−1B A−1, (4.4.9c) = A−1 − A−1BA−1. (4.4.9d)

Matrix Norms

The properties of norms of matrix A ∈ Rm×n has a norm, k · k, if the norm satisfies, 1. kAk ≥ 0, and if kAk = 0 then A = 0, 2. kA + Bk ≤ kAk + kBk, 3. kαAk = |α| kAk, and we must add the fourth property; 4. kABk ≤ kAk kBk.

As an example of a norm, X kAk = max |aij| (4.4.10) j i

47 Nitsche and Benner Unit 4. Rectangular Matrices

which is the maximum absolute value of the column sum. If kAk < 1, then 0 ≤ kAnk ≤ kAkn → 0 as n → ∞. So kAnk → 0 as n → ∞ and An → 0 as n → ∞. When is A−1B small?

−1 −1 A B ≤ A kBk, (4.4.11a)

−1 kBk = A kAk, (4.4.11b) kAk kBk = . (4.4.11c) kAk

Thus,

−1 1 A 6≤ . (4.4.12) kAk

−1 1 −1 Note, we have shown kA k 6≤ kAk , since kAA k = kIk which we suppose to be equal to 1. If this is the case,

−1 1 = AA , (4.4.13a) −1 = kAk A . (4.4.13b)

So,

1 −1 ≤ A . (4.4.13c) kAk

However, we would get,

−1 −1 kA kkAk A = , (4.4.13d) kAk −1 = A κ (A) . (4.4.13e)

Condition Number

For example pertaining to the condition number, we suppose we have Ax = b, and we have the perturbation (A + B) x˜ = b, where we know that kA−1Bk < 1, or in other words that B is sufficiently small. We can get the relative change in x introduced by the change in A,

−1 kx − x˜k A−1b − (A + B) b = , (4.4.14a) kxk kxk  −1 −1 A − (A + B) b = , (4.4.14b) kxk

48 4.5. HW 3: Due September 27, 2013 Applied Matrix Theory

If we use (A + B)−1 ≈ A−1 − A−1BA−1 kA−1BA−1bk ≈ , (4.4.14c) kxk kA−1Bkkxk ≤ , (4.4.14d) kxk kA−1kkBkkAk ≤ , (4.4.14e) kAk kBk = κ(A). (4.4.14f) kAk Thus, κ(A) measures the amplification of the errors.

4.5 Homework Assignment 3: Due Friday, September 27, 2013

For the first four problems, you may use the Matlab commands rref(a) and a\b to check your work. 1. Textbook 2.2.1: Row Echelon Form, Rank, Consistency, General solution of Ax = b. Determine the reduced row echelon form for each of the following matrices and then express each nonbasic column in terms of the basic columns: 1 2 3 3 (a) 2 4 6 9 2 6 7 6 2 1 1 3 0 4 1 4 2 4 4 1 5 5   2 1 3 1 0 4 3 (b)   6 3 4 8 1 9 5   0 0 3 −3 0 0 3 8 4 2 14 1 13 3 2. Textbook 2.3.3 If A is an m × n matrix with rank(A) = m, explain why the system [A|b] must be consistent for every right-hand side b. 3. Textbook 2.5.1 Determine the general solution for each of the following non homogeneous systems. (a)

x1 + 2x2 + x3 + 2x4 = 3, (4.5.1a)

2x1 + 4x2 + x3 + 3x4 = 4, (4.5.1b)

23x1 + 6x2 + x3 + 4x4 = 5. (4.5.1c)

49 Nitsche and Benner Unit 4. Rectangular Matrices

(b)

2x + y + z = 4, (4.5.2a) 4x + 2y + z = 6, (4.5.2b) 6x + 3y + z = 8, (4.5.2c) 8x + 4y + z = 10. (4.5.2d)

(c)

x1 + x2 + 2x3 = 3, (4.5.3a)

3x1 + 3x3 + 3x4 = 6, (4.5.3b)

2x1 + x2 + 3x3 + x4 = 3, (4.5.3c)

x1 + 2x2 + 3x3 − x4 = 0. (4.5.3d)

(d)

2x + y + z = 2, (4.5.4a) 4x + 2y + z = 5, (4.5.4b) 6x + 3y + z = 8, (4.5.4c) 8x + 5y + z = 8. (4.5.4d)

4. Textbook 2.5.4 Consider the following system:

2x + 2y + 3z = 0, (4.5.5a) 4x + 8y + 12z = −4, (4.5.5b) 6x + 2y + αz = 4. (4.5.5c)

(a) Determine all values of α for which the system is consistent. (b) Determine all values of α for which there is a unique solution, and compute the solution for these cases. (c) Determine all values of α for which there are infinitely many different solutions, and give the general solution for these cases.

5. Textbook 3.3.1: Linear Functions Each of the following is a function from R2 into R2. Determine which are linear functions. x  x  (a) f = . y 1 + y x y (b) f = . y x

50 4.5. HW 3: Due September 27, 2013 Applied Matrix Theory

Figure 4.2. Figures for Textbook problem 3.3.4.

x  0  (c) f = . y xy x x2 (d) f = . y y2 x  x  (e) f = . y sin y x x + y (f) f = . y x − y 6. Textbook 3.3.4 Determine which of the following three transformations in R2 are linear. 7. Textbook 3.5.4: Matrix Multiplication th th Let ej denote the j unit column that contains a 1 in the j position and zeros everywhere else. For a general matrix An×n, describe the following products. (a) | | Aej (b) ej A (c) ej Aej 8. Textbook 3.5.6 (please use induction) 1/2 α For A = , determine lim An. Hint: Compute a few powers of A and try 0 1/2 n→∞ to deduce the general form of An. 9. Textbook 3.5.9

If A = [aij(t)] is a matrix whose entries are functions of a variable t, the derivative of A with respect to t is defined to be the matrix of derivatives. That is, dA da  = ij . dt dt

51 Nitsche and Benner Unit 4. Rectangular Matrices

Derive the product rule for differentiation d(AB) dA dB = B + A . dt dt dt 10. Textbook 3.6.2

For all matrices An×k and Bk×n show that the block matrix  I − BA B  L = 2A − ABA AB − I has the property L2 = I. Matrices with this property are said to be involuntary, and they occur in the science of cryptography. 11. Textbook 3.6.3 For the matrix 1 0 0 1/3 1/3 1/3 0 1 0 1/3 1/3 1/3   0 0 1 1/3 1/3 1/3 A =   , 0 0 0 1/3 1/3 1/3   0 0 0 1/3 1/3 1/3 0 0 0 1/3 1/3 1/3 determine A300. Hint: A square matrix C is said to be idempotent when it has the property that C2 = C. Make use of the idempotent submatrices in A. 12. Textbook 3.6.5 If A and B are symmetric matrices that commute, prove that the product AB is also symmetric. If AB 6= BA, is AB necessarily symmetric? 13. Textbook 3.6.7

For each matrix An×n, explain why it is impossible to find a solution for Xn×n in the matrix equation AX − AX = I. (4.5.6) Hint: Consider the trace function. 14. Textbook 3.6.11 Prove that each of the following statements is true for conformable matrices (a) tr (ABC) = tr(BCA) = tr(CAB). (b) tr (ABC) can be different from tr (BAC). (c) A|B = tr(AB|) 15. Textbook 3.7.2: Inverses Find the matrix X such that X = AX + B, where 0 −1 0 1 2 A = 0 0 −1 and B = 2 1 . 0 0 0 3 3

52 4.5. HW 3: Due September 27, 2013 Applied Matrix Theory

16. Textbook 3.7.6 If A is a square matrix such that I − A is nonsingular, prove that

A (I − A)−1 = (I − A)−1 A.

17. Textbook 3.7.8 If A, B, and A + B are each nonsingular, prove that

A (A + B)−1 = B (A + B)−1 A = A−1 + B−1−1 .

18. Textbook 3.7.9 Let S be a skew- with real entries.

(a) Prove that I − S is nonsingular. Hint: x|x = 0 means x = 0. (b) If A = (I + S)(I − S)−1, show that A−1 = A|.

19. Textbook 3.9.9: Sherman–Morrison formula, rank 1 matrices

Prove that rank(An×n) = 1 if and only if there are nonzero columns um×1 and vn×1 such that A = uv|.

20. Textbook 3.9.10 2 Prove that rank(An×n) = 1, then A = τA, where τ = tr(A).

53 Nitsche and Benner Unit 4. Rectangular Matrices

54 UNIT 5

Vector Spaces

5.1 Lecture 14: September 20, 2013

Topics in Vector Spaces We will be discussing the following topics in this lecture (and possibly the next couple). • Field • Vector Space • Subspace • Spanning Set • Basis • Dimension

• The four subspaces of Am×n

Field We define a field as a set F with the properties such that, • Closed under addition (+) and multiplication ( · ). Thus if α, β ∈ F, then α + β ∈ F and α · β ∈ F. • Addition and multiplication are commutative. • Addition and multiplication are associative. This means that (α +β)+γ = α +(β +γ) and (αβ)γ = α(βγ). • Addition with multiplication is distributive. α(β + γ) = αβ + αγ. • There exists an additive and multiplicative identity α + 0 = α, α · 1 = α. • There exists an additive and multiplicative inverse α + (−α) = 0, α(α−1) = 1. For example the reals and the complex numbers are fields. The natural numbers are not, the rational numbers are. The set L2 = {0, 1} has the three operations 0 + 0 = 1, 0 + 1 = 1, 1 + 1 = 0.

55 Nitsche and Benner Unit 5. Vector Spaces

Vector Space We may define a vector space V over a field F is a set V with operations + and · such that,

• v + w ∈ V for any v, w ∈ V. • αv ∈ V for any v ∈ V, α ∈ F. • v + w = w + v for any v, w ∈ V. This is the commutative property of addition. • (u+v)+w = u+(v +w) for any u, v, w ∈ V, which is the associative law of addition. • For each 0 ∈ V contains u + 0 = u, for any u ∈ V. • For each −u ∈ V contains u + (−u) = 0, for any u ∈ V. • (αβ)u = α(βu) for any α, β ∈ F, u ∈ V. • (α + β)u = αu + βu for any α, β ∈ F, u ∈ V. This is the first form of the distributive property. • 1 · u = u, the 1 multiplication identity in F. • α(u + v) = αu + αv for any α ∈ F, and u, v ∈ V.

Examples of vector spaces of R is Rn = Rn×1, Rn×m, Cm×n, all functions such that [0, 1] → R, all polynomials which map R → R. Theorem 5.1. A subset S of a vector space V over F is a vector space over F if

• v + w ∈ S, for any v, w ∈ S.

• αv ∈ S for any α ∈ F, v ∈ S.

Several examples include All continuous functions: [0, 1] → R = C[0, 1], all polynomials of degree n, S = {0} contained in V.

Definition 5.2. Let {v1,..., vn} ∈ V, then span{v1,..., vn} = {α1v1 + α2v2 + ··· + αnvn, αk ∈ F}.

Theorem 5.3. This gives the theorem: The span of {v1,..., vn} is a subspace.

Definition 5.4. The set {v1,..., vn} is a spanning set of span{v1,..., vn}.

Note the 0 ∈ span{v1,..., vn}, and 0 ∈ subspace. 1 1 −2 For example, span contained in 2 = span , . This gives rise to 2 R 2 −4 1 the basis vector , thus the system is one-dimensional. The basis vector is illustrated 2 along with the solution on Figure 5.1.

Definition 5.5. A basis for a vector space is a minimal spanning set.

Theorem 5.6. Any two passes for a vector space have the same number of elements.

Definition 5.7. The number of elements in the basis is equal to the dimension of the space.

56 5.1. Lecture 14: September 20, 2013 Applied Matrix Theory

x2

x1

Figure 5.1. Basis vector of example solution.

2 2 For example, P2 = {a1 + a2x + a3x } the basis of this set is {1, x, x } and we observe that it must have three dimensions. Therefore, for a polynomial of degree n the dimensions of the polynomial function space are dim(Pn) = n + 1. As another example, S = {0} = ∅ the basis is the null set, and we have a zero-dimensional system. Thus, zero cannot be an element of a basis.

Definition 5.8. A set {v1,..., vn} is linearly independent if α1v1 + α2v2 + ··· + αnvn = 0 implies α1 = α2 = ··· = αn = 0.

It follows that {0} is not a linearly independent space since,

α0 = 0, for any α 6= 0. (5.1.1)

Similarly, any set containing 0 is not linearly independent.

Examples of function spaces

On example is the solutions to y00 = 0. This is the set {y = αx + b | α, β ∈ R}. The vector space has two dimensions and the basis is {1, x}. Another example is the set 00 x −x of solutions of y = y. The set of solutions is {y = c1e + c2e }, which has the two- dimensional basis {ex, e−x}. A third example is the set of solutions of y00 = −y. This set is {y = c1 sin(x) + c2 cos(x)} which is also the two-dimensional space {sin(x), cos(x)}. A final example of interest is y00 = 2. This gives the solution set {y = x2 + αx + β}. This however is not a vector space because we are restricted by the defined coefficient of x2 being one! This results from the fact that this is a non homogeneous system, unlike the other examples which may be rearranged into homogeneous form. a b In the general example of 2×2 = , the basis of this system is R c d

1 0 0 1 0 0 0 0 , , , . 0 0 0 0 1 0 0 1

57 Nitsche and Benner Unit 5. Vector Spaces

5.2 Lecture 15: September 23, 2013

The four subspaces of Am×n

n m We now define the four fundamental subspaces of Am×n : R → R . These are:

1.R( A) = {y : y = Ax, x ∈ Rn} ⊂ Rm This is the column space.

2.N( A) = {x ∈ Rn : Ax = 0} ⊂ Rn. This is the null space of A.

3.R( A|) = {y : y = A|x, x ∈ Rm} ⊂ Rn. This is equivalently, R(A|) = {y : y| = x|A, x ∈ Rn} ⊂ Rm. This determines why this is the row space of A.

4.N( A|) = {x ∈ Rm : A|x = 0 or x|A = 0|} ⊂ Rm. This is called the left null space of A.

We want to show that R(A) is a vector space. So we let y1, y2 ∈ R(A) Then, y1 = Ax1 and y2 = Ax2 for some x1, x2. This tells us that

y1 + y2 = Ax1 + Ax2, (5.2.1a)

= A (x1 + x2) ∈ R(A). (5.2.1b)

Also

αy1 = αAx1, (5.2.2a)

= Aαx1 ∈ R(A). (5.2.2b)

Thus R(A) is a subspace of Rm. An example: Find the spanning set for all 4 subspaces of,

1 2 1 3 3 1 2 0 2 0 2 4 0 4 4 0 0 1 1 0 A =   →   (5.2.3) 1 2 3 5 5 0 0 0 0 1 2 4 0 4 2 0 0 0 0 0

So the row space, 1 1 3   2 0 4 4 R(A) = span  ,  ,   ⊂ R . (5.2.4) 1 3 5  2 0 2  To find the column space, we need the solution of the homogeneous equation Ax = 0.

x1 = −2x2 − 2x4, (5.2.5a)

x3 = −x4, (5.2.5b)

x5 = 0, (5.2.5c)

58 5.2. Lecture 15: September 23, 2013 Applied Matrix Theory

or

−2 −2  1  0     x = x2  0 + x4 −1 . (5.2.6)      0  1 0 0

Thus,

−2 −2    1  0     5 N(A) = span  0 , −1 ⊂ R . (5.2.7)      0  1  0 0 

Now say,

A → EA : Pm×mAm×n = EA,m×n. (5.2.8)

We have that Pm×m is square and invertible (it is a product matrix). We also know that −1 PA = EA where the rows EA are a linear combination of rows of A. Similarly, A = P EA has that the rows of A are linearly commutations of the rows of EA or that the row space of A is equal to the row space of EA. So,

R(A|) = row space| of A, (5.2.9a) 1 0 0   2 0 0       = span 0 , 1 , 0 , (5.2.9b)       2 1 0  0 0 1  = y : y = A|x or y| = x|A (5.2.9c)

59 Nitsche and Benner Unit 5. Vector Spaces

To find the fourth space, N(A|),

1 2 1 2 1 2 1 2 2 4 2 4 0 0 0 0     1 0 3 0 → 0 −2 2 −2, (5.2.10a)     3 4 5 4 0 −2 2 −2 3 4 5 2 0 −2 2 4 1 2 1 2 0 1 −1 1   → 0 0 0 −2, (5.2.10b)   0 0 0 0 0 0 0 0 1 2 1 2 0 1 −1 1   → 0 0 0 −2, (5.2.10c)   0 0 0 0 0 0 0 0 1 2 1 0 0 1 −1 0   → 0 0 0 1, (5.2.10d)   0 0 0 0 0 0 0 0 1 0 3 0 0 1 −1 0   → 0 0 0 1. (5.2.10e)   0 0 0 0 0 0 0 0

So the solution for A|x = 0,

x1 = −3x3, (5.2.11a)

x2 = x3, (5.2.11b)

x3 = x3. (5.2.11c) or −3  1 x = x3   . (5.2.12)  1 0 This finally gives us that, −3    1 4 N(A|) = span   ⊂ R . (5.2.13)  1  0 

60 5.3. Lecture 16: September 25, 2013 Applied Matrix Theory

So the dimension of the row space of A is

dim (R(A)) = r, (5.2.14)

which is also known as the rank of A. The dimensions of the other spaces are

dim (N(A)) = n − r. (5.2.15)

For dim R(A|) = r. (5.2.16)

Finally, dim N(A|) = n − r. (5.2.17)

Alternative to fin the left null space of A. That is

N(A|) = x : x|A = 0 . (5.2.18)

We use   — b1 — .  .   .    PA = — br — (5.2.19a)  .   .  — 0 — with r rows occupied and n − r zero rows. From this we can use block matrices,

P  P = 1 (5.2.20) P2

So P  P A PA = 1 A = 1 . (5.2.21) P2 P2A

We know that P2A = 0. So we claim that the rows of P2 span the left null space of A = N(A|) and | | R(P2) = N(A ). (5.2.22)

5.3 Lecture 16: September 25, 2013

Dr. Nitsche is not in town October 18 or Wednesday before thanksgiving. May have to have alternate times for class.

61 Nitsche and Benner Unit 5. Vector Spaces

The Four Subspaces of A To recall what we discussed last class,

• R(A) is the range of A or the column space. This has dimensions r.

• N(A) is the column space of A| = {A|y}. This has dimensions n − r.

• R(A|) is the rowspace transpose of A = {(yA|)|} and is also known as the left range of A.This has dimensions r.

• N(A|) = {x : A|x = 0} = {x : x|A = 0} and this is the left null space of A. This has dimensions m − r.

Returning to the manipulation A → EA with PA = EA with Pm×m is invertible.

P  P A 1 A = 1 , (5.3.1a) P2 P2A B  = 1 , (5.3.1b) 0 where P2A = 0.

Theorem 5.9. | | N(A ) = R(P2) (5.3.2) where the right hand side is the rowspace of P2.

| | Proof. For proof of ⊇, Assume y ∈ R(P2). Then y = P2x for some x. Reformuating, | | | | | | y = x P2. So y A = x P2A = x 0, which gives y ∈ N(A ). | | | −1 Also assume ⊆, assume y ∈ N(A ), Then y A = 0. This gives y P EA = 0 = U y|[Q |Q ] . So, 0 = (y|Q )U where we have that U is full rank. This gives 1 2 0 1 r×m y|Q = 0. 1   P1 We know that QP = I.[Q1|Q2] = I, Q1P1 + Q2P2 = I, Q1P1 = I − Q2P2. This P2 | | | | gives, 0 = y Q1P1 = y (I − Q2P2). So, y = y Q2P2 and so we have

| |  | y = P2 Q2y ∈ R(P ). (5.3.3)

 As an example,

   1  1 2 1 3 3 1 0 0 0 1 2 0 2 0 0 − 2 0 1 2 1 1  2 4 0 4 4 0 1 0 0   0 0 1 1 0 0 − 3 3 2    →  1 1  (5.3.4a)  1 2 3 5 5 0 0 1 0   0 0 0 0 1 0 2 0 − 2  1 1 2 4 0 4 2 0 0 0 1 0 0 0 0 0 1 − 3 − 3 0

62 5.3. Lecture 16: September 25, 2013 Applied Matrix Theory

Note that the N(A|) is orthogonal to R(A). We also find from this manipulation that 1 1  3   2 0 −4 R(A) = span   ,   ,   (5.3.5) 1 3  5  2 0 2  and  3   | | −1 N(A ) = R(P2) =   . (5.3.6) −1  0 

Linear Independence

Definition 5.10. A set {v1,..., vn} is linearly independent if α1v1 + ··· + αnvn = 0 implies α1 = ··· = αn = 0. From this we get the equivalent statements;

• {v1,..., vn} linearly independent,

• A = [v1 ··· vn] has full rank r, • N(A) = {Aα = 0} = {0}. For example we have the polynomial basis set to order n, {1, x, x2, . . . , xn} which is 2 n linearly independent because, c0 + c1x + c2x + ··· + cnx = 0 implies that c0 = ··· = xn = 0. As another example we can show that the zero set, {0} is linearly independent. This is because α0 = 0 for any α 6= 0. Any set containing 0, e.g. {v1,..., vn, 0} is linearly dependent. m Another example is any set of distinct unit vectors, {ei1, ei2,..., ein} where ei ∈ R and n ≤ m. This is also a linear independent since, 0 0 1 0 0 0   A = 1 0 0 . (5.3.7)   0 1 0 0 0 0 We take as another example the Van der Monde matrix which has applications in poly- nomial interpolation. Let x1, . . . , xm be distinct real numbers,

 2 n−1 1 x1 x1 ··· x1 1 x x2 ··· xn−1  2 2 2  A =  .  (5.3.8)  .  2 n−1 1 xm xm ··· xm

| where n ≤ m. Then we have Ac = y, where c = [c0 ··· cn−1] . Because p(x1) = y1 and p(xm) = ym. Solution to Ac = y gives a polynomial that interpolates (xk, yk). For Ac = 0 then we have p with m roots x1, . . . , xm, but another polynomial of degree n − 1 can only have n − 1 distinct roots since m > n − 1. So p ≡ 0 and therefore c ≡ 0.

63 Nitsche and Benner Unit 5. Vector Spaces

y

(xn, yn)

(x1, y1) x

(xk, yk)

Figure 5.2. Interpolating system.

5.4 Lecture 17: September 27, 2013

Linear functions (rev) Is f linear? Here it was good to find the formula. Some could be done by inspection. Here

we also should check f(p1 + p2) = f(p1) + f(p2) and f(αp) = αf(p). So let’s talk about the finding the functions; say the flipping function: 1 0 x f(x, y) = (x, −y) = (5.4.1) 0 −1 y For the mapping of the projection, x + y x + y   1 1    2 2 x f(x, y) = , = 1 1 (5.4.2) 2 2 2 2 y For the rotation, x = r cos(ψ), and y = r sin(ψ). If we denote the shifted with primes, x0 = r cos(ψ + θ) and y0 = r sin(ψ + θ). We can use identities to get x0 = r(cos ψ cos θ − sin ψ sin θ) = x cos θ − y sin θ and y0 = r(sin ψ cos θ + cos ψ sin θ) = y cos θ + x cos θ. This gives us the function, x0 cos(θ) − sin(θ) x f(x, y) = = . (5.4.3) y0 sin(θ) cos(θ) y Note this is a skew symmetric matrix with determinant equal to 1.

Review for exam Anything on the first three homework’s is fair game. We have been doing computations of the LU, PLU, REF, RREF. We have solved Ax = b. Writing systems of linear equa- tions in matrix form. We have talked about the elementary matrices and the process of premultiplication as well as there invertibility.

64 5.4. Lecture 17: September 27, 2013 Applied Matrix Theory

We have also discussed some proof, especially this last one. We showed this major one: tr(AB) = tr(AB), (AB)| = B|A|,(AB)−1 = B−1A−1,(A−1)| = (A|)−1 = A−|. Similarly we have shown that the LU decomposition exists if all principle submatrices are −1 Pn k k −1 invertible. The relation (I − A) = k=0 A if A → 0. We also discussed (A + B) with perturbation matrices. Finally, we discussed rank one matrices, so we need to know the Sherman–Morrison formula, (I + uv|)−1.

Previous lecture continued Comment on previous lecture:

 2 n−1 1 x1 x1 ··· x1 1 x x2 ··· xn−1  2 2 2  A =  .  (5.4.4)  .  1 x x2 ··· xn−1 m m m m×n

When we consider Ac = y is equivalent to p(xi) = yi, where i = 1, ··· , m. Thus we have n−1 the equation c0 + c1xi + c2x2 + ··· + cn−1xi = 0, where i = 1, ··· , m, and we have a linear system in the coefficients, ck. m ≥ n. In terms of vectors, these are linearly independent because the set   1  x   x2  xn−1  1 1 1  .  .   .   .  . ,  .  ,  .  , ··· ,  .  , (5.4.5)  2 n−1   1 xm xm xm  has rank(A) = n. To show that this is linearly independent, we set up the system

     2   n−1   1 x1 x1 x1 0 .  .   .   .  . c0 . + c1  .  + c2  .  + ··· + cn−1  .  = . . (5.4.6) 2 n−1 1 xm xm xm 0

Here we must show that we have at least m distinct roots, but p ∈ Pn−1 has at most n − 1 roots. We know this by the fundamental theorem of algebra. So, m > n − 1 and the polynomial must be identically equal to the zero polynomial, p ≡ 0, and ck = 0 for all k. n−1 So we want to interpolate the polynomial p(x) ∈ P . We set up p(xi) = yi for i = 1, . . . , m. If n − 1 = m then we will have a unique solution to the interpolation. If instead m > n then we have either no solution or infinitely many solutions. We defined the span of a set as the set of all linear combinations that are a vector set over the field of reals: P span {v1,..., vn} = { cnvn, cn ∈ R}. The basis for a vector space V is the set {v1,..., vk}, that spans V and is linearly independent. We also know that the basis for {0} is the empty set ∅. Thus, for convenience, we define span {∅} = {0}.

Theorem 5.11. If {v1,..., vn} is a linearly independent basis of V, then {u1,..., um} m > n is linearly dependent.

65 Nitsche and Benner Unit 5. Vector Spaces

5.5 Lecture 18: October 2, 2013

Exams and Points We decided that we will have three exams total, but only the best two will each count for 20% of our semester grade. Homework will be worth 60%. Lecture notes will be posted online.

Continuation of last lecture

Theorem 5.12. If {u1,..., un} spans V and S = {v1,..., vn} ⊂ V with m > n, then S is linearly dependent. Pm Pn Proof. Consider i=1 αiv1 = 0. Using vi = j=1 cijuj,

m n X X αi cijuj = 0, (5.5.1a) i=1 j=1 n m ! X X αicij uj = 0. (5.5.1b) j=1 i=1 | {z } (α|C)j

| | | Since Cn×mα = 0 has ranks free, recall there exists α 6= 0 such that C α = 0. So (C α) = 0 P j for any j so i αivi = 0.  Definition 5.13. A basis of V is a linearly independent spanning set of V.

Theorem 5.14. Any two basis have the same number of elements.

Equivalent characterizations of basis,

• linearly independent spanning set • minimal spanning set • max linearly independent subset of V. Definition 5.15. dim(V) is equal to the number of elements in the basis.

Recalling the four subspaces for a matrix,

 | | |  Am×n = a1 a2 ··· an ; (5.5.2) | | | m×n

• R(A) ⊂ Rm, dim = r; • N(A) ⊂ Rn, dim = n − r; • R(A|) ⊂ Rn, dim = r;

66 5.5. Lecture 18: October 2, 2013 Applied Matrix Theory

• N(A|) ⊂ Rm, dim = m − r. Definition 5.16. If X and Y are two subspaces of V then X + Y = {x + y, x ∈ X , y ∈ Y} . (5.5.3) Is X + Y a subspace? We shall illustrate this in two parts 1. Given z ∈ X + Y, is αz ∈ X + Y? If this is the case, z = x + y and αz = αx + αy ∈ X + Y, where we recalled that the vectors x and y are within their respective sets.

2. Given z1, z2 ∈ X + Y, is z1 + z2 ∈ X + Y?

Here we substitute for the summed vectors of each of the z vectors, (x1+y1)+(x2+y2) = (x1 + x2) + (y1 + y2) ∈ X + Y. Theorem 5.17. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y).

Proof. Let BX ∩Y = {z1,..., zk} be the basis for X ∩ Y. Then we can extent the set to bases for X and Y.

BX = {z1,..., zk, x1,..., xn} , (5.5.4a)

BY = {z1,..., zk, y1,..., ym} . (5.5.4b)

We now claim that we have a set S = {z1,..., zk, x1,..., xn, y1,..., ym} = BX +Y . We now consider: does S span X + Y? We let z ∈ X + Y. Then, we know z = x + y for every x ∈ X and y ∈ Y. So, ! ! X X X 0 X z = αizi + βixi + αizi + γiyi , (5.5.5a) i i i i X 0 X X = (αi + αi) zi + βixi + γiyi, ∈ span(S). (5.5.5b) i i i Is S linearly independent? Consider X X X αizi + βixi + γiyi = 0, (5.5.6a)   X X X  γiyi = −  αizi + βixi , ∈ X ∩ Y, (5.5.6b) | {z } | {z } ∈Y ∈X X = δizi, (5.5.6c) X X γiyi + δizi = 0, (5.5.6d)

This indicates γi = δi ≡ 0. X X αizi + βixi = 0, (5.5.6e) which also indicates αi = δi ≡ 0. 

67 Nitsche and Benner Unit 5. Vector Spaces

From our example the range was spanned by the vectors,

1 1 3   2 0 4 4 R(A) = span   ,   ,   ⊂ R , (5.5.7a) 1 3 5  2 0 2  −2 −2    1  0     5 N(A) = span  0 , −1 ⊂ R , (5.5.7b)      0  0  0 0  1 0 0   2 0 0       5 R(A|) = span 0 , 1 , 0 ⊂ R , (5.5.7c)       2 1 0  0 0 1   3   −1 4 N(A|) = span   ⊂ R . (5.5.7d) −1  0 

Theorem 5.18. (a) R(A) is orthogonal to N(A|) and (b) R(A) ∪ N(A|) = {0}. | m | n Which means R(A) + N(A ) = R and R(A ) + N(A) = R . Any Am×n gives an orthogonal decomposition of Rn and Rm. Proof. (a) Let y ∈ R(A) gives y = Az for some z. Then x ∈ N(A|) which means that A|x = 0 and additionally x|A = 0. Considering x|y = x|Az = 0, so x must be orthogonal to y; therefore R(A) ⊥ N(A).

| | (b) If x ∈ R(A) and x ∈ N(A ), then x x = 0 which implies xi = 0 and x = 0. 

68 UNIT 6

Least Squares

6.1 Lecture 19: October 4, 2013

Least Squares We will now be covering the concept of least squares. If we are given an equation Ax = b, we may multiply by the transpose of the matrix to find the least squares solution; A|Ax = A|b. We will show that this is consistent even if Ax = b is inconsistent. Previously we showed,

Theorem 6.1. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X , Y are subspaces of V.

We now consider,

Theorem 6.2. Given conformal matrices A and B,

rank(A + B) ≤ rank(A) + rank(B) . (6.1.1) | {z } | {z } | {z } dim(R(A+B)) dim(R(A)) dim(R(B))

Proof. R(A + B) ⊂ R(A) + R(B) since, if y ∈ R(A + B) then

y = (A + B)x, (6.1.2a) = Ax + Bx, ∈ R(A) + R(B), ⊂ R(A) ⊂ R(B). (6.1.2b)

Further,

dim(R(A + B)) ≤ dim(R(A) + R(B)), (6.1.3a) = dim(R(A)) + dim(R(B)) − dim(R(A) ∩ R(B)), (6.1.3b) ≤ dim(R(A)) + dim(R(B)), (6.1.3c) = rank(A) + rank(B). (6.1.3d)



69 Nitsche and Benner Unit 6. Least Squares

Theorem 6.3. rank(AB) = rank(B) − dim(N(A) ∩ R(B))

Proof. Let S = {x1,..., xs} be a basis of N(A) ∩ R(B). Since N(A) ∩ R(B) ⊂ R(B) can extend S to a basis for R(B),

BR(B) = {x1,..., xs, z1,..., zt} . (6.1.4)

To prove dim(R(AB)) = t we claim {Az1 ,..., Azt } is a basis for R(AB). First we show that it spans. We let b ∈ R(AB). So b = ABy, for some y where By ∈ R(B). So ! X X b = A αix1 + βizi , (6.1.5a) i i X X = αi Axi + βiAzi, since x1 ∈ N(A). (6.1.5b) |{z} i =0 i X = βiAzi, ∈ span(S1). (6.1.5c) i P P Next we show that S2 is lineally independent; αiAzi = 0. Rearranging, A ( αizi) = 0 P i P P iP and αizi ∈ N(A) ∩ R(B) since zi ∈ R(B). Thus, αizi = βixi and αizi − P i i i i i βixi = 0. Therefore, αi = βi = 0 since {zi, xi} are linearly independent. 

Theorem 6.4. Given matrices Am×n and Bn×p, then

rank(A) + rank(B) − n ≤ rank(AB) ≤ min(rank(A), rank(B)) (6.1.6)

Proof. We will consider the right inequality first and the left inequality second. First, rank(AB) ≤ rank(B). We know that rank((AB)|) = rank(B|A|) ≤ rank(A|) = rank(A) and finally rank((AB)|) = rank(AB). For the left inequality, N(A) ∩ R(B) ⊂ N(A). Thus, dim(N(A) ∩ R(B)) ≤ dim(N(A)) = n−rank(A). So, rank(AB) = rank(B)−dim(N(A)∩R(B)) ≥ rank(B)−(n−rank(A)).  Theorem 6.5. (1) rank(A|A) = rank(A) and rank(AA|) = rank(A|) = rank(A).

(2) R(A|A) = R(A|) and R(AA|) = R(A).

(3) N(A|A) = N(A) and N(AA|) = N(A|).

Proof. For part (1), rank(A|A) = rank(A)−dim(N(A|)∩R(A)), but N(A|) ⊥ R(A) and so N(A|) ∩ R(A) = {0}. Since, if we let x ∈ N(A|) and x ∈ R(A) then A|x = 0 and x = Ay. Which gives x|x = y|Ax = 0 which implies that x = 0. So dim(N(A|) ∩ R(A)) = 0. to be continued... 

6.2 Lecture 20: October 7, 2013

We will have two weeks for the next homework.

70 6.2. Lecture 20: October 7, 2013 Applied Matrix Theory

Properties of Transpose Multiplication In review we covered the following theorems last time:

Theorem 6.6. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X , Y are subspaces of V.

We also had the theorem,

Theorem 6.7. rank(AB) = rank(B) − dim(N(A) ∩ R(B))

And finally we showed the relation

Theorem 6.8. rank(AB) = rank(B) − dim(N(A) ∩ R(B))

We left off at the theorem covering multiplication relations and the rank and dimensions of the matrix,

Theorem 6.9. (1) rank(A|A) = rank(A) and rank(AA|) = rank(A|) = rank(A).

(2) R(A|A) = R(A|) and R(AA|) = R(A).

(3) N(A|A) = N(A) and N(AA|) = N(A|).

We proved the first one using the third of the theorems above. We now prove the second and third parts of this theorem.

Proof. For part 2, Let y ∈ R(A|A) then A|Ax = y for some x. So y = A|z for some z. Thus y ∈ R(A|) So R(A|A) ⊂ R(A|), since dim(R(A|A)) = dim(R(A|)) and R(A|A) = R(A|).

This is because BR(A|A) ⊂ BR(A|) but since these have the same number of elements so BR(A|A) = BR(A|). For the third part we want to show that the basis are contained in the other and then we compare the domimensions. So we let x ∈ N(A) the Ax = 0 or A|Ax = 0 and x ∈ N(A|A) so N(A) ⊂ N(A|A). But also dim(N(A)) = n + r or dim(N(A|A)) = n − r therefore the two sets must be the same; N(A) = N(A|A). 

The Normal Equations Definition 6.10. The normal equations for a system Ax = b is

A|Ax = A|b. (6.2.1)

Theorem 6.11. For any A, A|Ax = A|b is consistent.

Proof. RHS in R(A|); by the previous theorem RHS ∈ R(A|A) for every x such that A|Ax = RHS.  Note: the solution to the normal equation is unique when rank(A) = n.

71 Nitsche and Benner Unit 6. Least Squares

Example 6.12. Fit (xi, yi), i = 1, . . . , m by a polynomial of degree 2. So p(x) = c0 + c1x + 2 2 c2x , where m > 3. Our problem it solve is p(xi) = yi, i = 1, . . . , m or c0 + c1xi + cixi = yi, i = 1, . . . , m. The system is therefore linear in the system of the unknowns c0, c1, and c2. We can write this in matrix form,

 2    1 x1 x1   y1 1 x x2  c0  y   2 2   2   .  c1 =  .  (6.2.2)  .  c  .  2 2 1 xm xm ym or alternatively we have the system Ac = y. What is the rank of the matrix A? We know that  2  1 x1 x1 1 x x2   2 2   .   .  2 1 xm xm is invertible, since Ac = 0 implies that c = 0. So,   p(x1)  p(x )   2  Ac =  .  . (6.2.3)  .  p(xm)

Now, 3 ≤ m − 1 and we know that

 2  1 x1 x1 1 x x2   2 2  rank  .  = 3 (6.2.4)  .  2 1 xm xm

and A|Ax = A|b has a unique solution. To solve the normal equations,

 2      1 x1 x1     y1 1 1 1 1 x x2  c0 1 1 1  y   2 2   2  x1 x2 ··· xm . c1 = x1 x2 ··· xm .  , (6.2.5a) x2 x2 x2  .  c x2 x2 x2  .  1 2 m 2 2 1 2 m 1 xm xm ym  P P 2    P  m xi xi c0 yi P P 2 P 3 P  xi xi xi c1 =  xiyi  (6.2.5b) P 2 P 3 P 4 P 2 xi xi xi c2 xi yi Suggestion: have an outline of the major proofs we have shown in class in your mind. Go back and give them a study over.

| | 2 | Theorem 6.13.A Ax = A b gives x which minimizes kAx − bk2 = (Ax − b) (Ax − b).

72 6.2. Lecture 20: October 7, 2013 Applied Matrix Theory

b

N(A) Ax

Figure 6.1. Minimization of distance between point and a plane.

y

x

Figure 6.2. Parabolic fitting by least squares

By corollary this is an if and only if statement. Here every solution of the normal Pm 2 equations minimizes the sum of the squares of the entries of the vector i=1 (Ax − b)i . 2 | P 2 Note here kxk2 = x x = xi . We illustrate this in Figure 6.1 where the minimal line connecting a point to a plane is shown.

Example 6.14. What does the solution to the normal equations minimize from our example? Pm 2 P 2 P 2 The solution c0, c1, and c2 minimizes i=1 (Ac − y)i = ((Ac)i − yi) = (p(xi) − yi) . We can visualize our parabolic least squares method as shown in Figure 6.2.

Exam 1

We had a range from 36–98, with a median of 66. For this exam: 70–100 is an A-range score, 50–70 is about a B, and below is a C (as long as as you are showing involvement in the class). First two problems went fine; four was covered in class, five was on the homework, we will confer the solution of the sixth problem in class next time.

73 Nitsche and Benner Unit 6. Least Squares

6.3 Lecture 21: October 9, 2013

Need to have a couple classes early because of missing next Friday. So, next Monday and Wednesday we will start at 8:35. We will review problem 6 from the exam, then finish up least squares; cover linear dependence and finally linear transformations.

Exam Review

We review exam problems 6. Given u, v ∈ Rn. (a) Show A = I + uv| is A−1 = I + αuv|. Find α. So we check that AA−1 = A−1A = I. Now AA−1 = I + αuv| I + αuv| , (6.3.1a) = I + αuv| + uv| + uv|αuv|, (6.3.1b) = I + αuv| + uv| + αu v|u v|, (6.3.1c) = I + αuv| + uv| + α v|u uv|, (6.3.1d) = I + uv| 1 + α(1 + v|u) . (6.3.1e)

| 1 This is equal to I if 1 + α(1 + v u) = 0 or when α = 1+v|u . Thus, the Sherman–Morrison formula is, 1 I + uv|−1 = I − uv|. (6.3.2) 1 + v|u | −1 | For part (b) B = A + αeˆieˆj = A I + αA eˆieˆj where A is invertible. For the inverse of B:

−1 −1 |−1 −1 B = I + αA eˆieˆj A , (6.3.3a) " # 1 = I + αA−1eˆ eˆ| A−1. (6.3.3b) | −1 i j 1 + eˆj αA eˆi

| −1 This exists if 1 + αeˆj A eˆi 6= 0 and can make α sufficiently small.

| {z−1 } Aji

Least squares and minimization Theorem 6.15.x solves A|Ax = A|b is equivalent to x minimizes (Ax − b)|(Ax − b) = 2 2 | P 2 kAx − bk2, where kxk2 = xx = i xi . Note:

f(x) = f(x1, x2, . . . , xn), (6.3.4a) = (Ax − b)|(Ax − b), (6.3.4b) = (x|A| − b|)(Ax − b), (6.3.4c) = x|A|Ax − x|A|b − b|Ax + b|b. (6.3.4d)

74 6.3. Lecture 21: October 9, 2013 Applied Matrix Theory

For scalars x, we have that x| = x. So, v|Ax = b|Ax| = x|A|b. This manipulates our previous result to, = x|A|Ax − 2x|A|b + b|b. (6.3.5)

This is a quadratic form and the minimum occurs when ∂f = 0. ∂xi

Proof. To prove from the right hand side to the left; suppose x minimizes f(x), then

∂f 0 = , (6.3.6a) ∂xi ∂x| ∂x ∂x| = A|Ax + x|A|A − 2 A|b, (6.3.6b) ∂xi ∂xi ∂xi | | | | = 2eˆi A Ax − 2eˆi A b. (6.3.6c)

This gives us | | | | eˆi A Ax = eˆi A b (6.3.7) and | | (A Ax)i = (A b)i, any i. (6.3.8) This finally means that we have formulated equivalently to A|Ax = A|b.

ASIDE: ∂ ∂u ∂v (uv) = v + u . (6.3.9) ∂xi ∂xi ∂xi

To prove going the other direction, suppose that x solves A|Ax = A|b then show that f(x) < f(y) for any y 6= x. First, we consider

f(y) − f(x) = y|A|Ay − 2y A|b +b|b − x|A|Ax − 2x A|b +b|b, (6.3.10a) |{z} |{z} A|Ax A|Ax = y|A|Ay − 2yA|Ax − x|A|Ax − 2xA|Ax, (6.3.10b) = (Ay − Ax)| (Ay − Ax) , (6.3.10c) 2 = kA (y − x)k2, (6.3.10d) ≥ 0. (6.3.10e)

If A has full rank (no nontrivial null space), then this must be greater than zero. So any solution to the normal equations minimizes this norm, or and solution A|Ax = A|b 2 minimizes kAx − bk2. Further, if A has full rank then we are guaranteed a unique least squares solution x. Finally, if A has a nontrivial null space (r < n) then we have infinitely many least squares solutions. 

In Matlab we can do help \ to find out what solution it gives for underdetermined solutions. What does it minimize?

75 Nitsche and Benner Unit 6. Least Squares

6.4 Homework Assignment 4: Due Monday, October 21, 2013

1. Textbook 4.1.1: Vector spaces, subspaces, fundamental subspaces of a matrix. Determine which of the following subsets of Rn are in fact subspaces of Rn (n > 2).

(a) {x | xi ≥ 0},

(b) {x | x1 = 0},

(c) {x | x1x2 = 0}, n Pn o (d) x j=1 xj = 0 , n Pn o (e) x j=1 xj = 1 ,

(f) {x | Ax = b, where Am×n 6= 0 and bm×1 6= 0}.

2. Textbook 4.1.2 Determine which of the following subsets of Rn×n are in fact subspaces of Rn×n.

(a) The symmetric matrices. (b) The diagonal matrices. (c) The nonsingular matrices. (d) The singular matrices. (e) The triangular matrices. (f) The upper-triangular matrices. (g) All matrices that commute with a given matrix A. (h) All matrices such that A2 = A. (i) All matrices such that tr(A) = 0.

3. Textbook 4.1.6 Which of the following are spanning sets for R3?

(a) 1 1 1 , (b) 1 0 0, 0 0 1 , (c) 1 0 0, 0 1 0, 0 0 1, 1 1 1 , (d) 1 2 1, 2 0 −1, 4 4 1 , (e) 1 2 1, 2 0 −1, 4 4 0 .

4. Textbook 4.1.7 For a vector space V, and for M, N ⊆ V, explain why span(M ∪ N ) = span(M) + span(N ).

76 6.4. HW 4: Due October 21, 2013 Applied Matrix Theory

5. Textbook 4.2.1 Determine spanning sets for each of the four fundamental subspaces associated with

 1 2 1 1 5 A = −2 −4 0 4 −2 . 1 2 2 4 9

6. Textbook 4.2.3 Suppose that A is a 3 × 3 matrix such that        1 1   −2  R = 2, −1 and N =  1  3 2   0 

span R(A) and N(A), respectively, and consider a linear system Ax = b, where  1 b = −7. 0

(a) Explain why Ax = b must be consistent. (b) Explain why Ax = b cannot have a unique solution.

7. Textbook 4.2.7   A1 | If A = is a square matrix such that N(A1) = R(A2), prove that A must be A2 nonsingular. 8. Textbook 4.2.8 Consider a linear system of equations Ax = b for which y|b = 0 for every y ∈ N(A|). Explain why this means the system must be consistent. 9. Textbook 4.3.1(abc): Linear independence, basis. Determine which of the following sets are linearly independent. For those sets that are linearly dependent, write one of the vectors as a linear combination of the others.        1 2 1  (a) 2, 1, 5  3 0 9  (b) 1 2 3, 0 4 5, 0 0 6, 1 1 1        3 1 2  (c) 2, 0, 1  1 0 0  10. Textbook 4.3.4 Consider a particular species of wild flower in which each plant has several stems, leaves, and flowers, and for each plant let the following hold.

77 Nitsche and Benner Unit 6. Least Squares

S = the average stem length (in inches). L = the average leaf width (in inches). F = the number of flowers. Four particular plants are examined, and the information is tabulated in the following matrix: SLF #1 1 1 10  #2 2 1 12  A =   #3 2 2 15  #4 3 2 17 For these four plants, determine whether or not there exists a linear relationship be- tween S, L, and F . In other words, do there exist constants α0, α1, α2, and α3 such that α0 + α1S + α2L + α3F = 0? 11. Textbook 4.3.13 Which of the following sets of functions are linearly independent? (a) {sin(x), cos(x), x sin(x)}. (b) {ex, xex, x2ex}. (c) sin2(x), cos2(x), cos(2x) . 12. Textbook 4.4.2 Find a basis for each of the four fundamental subspaces associated with 1 2 0 2 1 A = 3 6 1 9 6 (6.4.1) 2 4 1 7 5

13. Textbook 4.4.8

Let B = {b1, b2,..., bn} be a basis for a vector space V. Prove that each v ∈ V can be expressed as a linear combination of the bi’s, v = α1b1 + α2b2 + ··· + αnbn, in only one way—i.e., the coordinates αi are unique. 14. Textbook 4.5.5 For A ∈ Rm×n, explain why A|A = 0 implies A = 0. 15. Textbook 4.5.8 Is rank(AB) = rank(BA) when both products are defined? Why? 16. Textbook 4.5.14 Pr Prove that if the entries of Fr×r satisfy j=1 |fij| < 1 for each i (i.e., each absolute row sum < 1), then I + F is nonsingular. Hint: Use the triangle inequality for scalars |α + β| ≤ |α| + |β| to show N(I + F) = 0. 17. Textbook 4.5.18 If A is n × n, prove that the following statements are equivalent:

78 6.4. HW 4: Due October 21, 2013 Applied Matrix Theory

(a)N( A) = N(A2) (b)R( A) = R(A2) (c)R( A) ∩ N(A) = {0}

18. Textbook 4.6.1: Least Squares. Hookes law says that the displacement y of an ideal spring is proportional to the force x that is applied—i.e., y = kx for some constant k. Consider a spring in which k is unknown. Various masses are attached, and the resulting displacements shown in the figure are observed. Using these observations, determine the least squares estimate for k.

19. Textbook 4.6.2 Show that the slope of the line that passes through the origin in R2 and comes closest in the least squares sense to passing through the points {(x1, y1), (x2, y2),..., (xn, yn)} P P 2 is given by m = i xiyi/ i xi . 20. Textbook 4.6.6 After studying a certain type of cancer, a researcher hypothesizes that in the short run the number (y) of malignant cells in a particular tissue grows exponentially with time α1t (t). That is, y = α0e . Determine least squares estimates for the parameters α0 and α1 from the researchers observed data given below.

t (days) 1 2 3 4 5 y (cells) 16 27 45 74 122

Hint: What common transformation converts an exponential function into a linear function?

79 Nitsche and Benner Unit 6. Least Squares

80 UNIT 7

Linear Transformations

7.1 Lecture 22: October 14, 2013

m Theorem 7.1. Given a vector space V. If {u1,..., un} spans V and {vi}i=1 ⊂ V, then m {vi}i=1 is linearly dependent if m > n (because there is more vectors in the set). Pm Pn Pm Pn Proof. Consider i=1 αivi = 0. Use vi = j=1 cijuj then i=1 αi j=1 cijuj = 0 and n Pm X | | i=1 αicij uj = 0 If we consider α = α1 ··· αn So c α = 0 has nonzero j=1 | {z } (α|C)j =(C|α)j | P solutions α, since m−n > 0 free variables. So for every αi 6= 0, c α = 0 and αivi = 0. 

Any two bases for V have the same number of elements.

Definition 7.2. Let V be a vector space with basis B = {b1,..., bn} The coordinates of   c1 x ∈ V are c =  .  such that x = Pn c b . j  .  j=1 j j cn   c1 Theorem 7.3. Coordinates of x ∈ V with respect to the basis B are unique. [x] =  . . B  .  cn

Example 7.4. We take as an example a vector x ∈ R3,

1 x = 2 , (7.1.1a) 3

= 1eˆ1 + 2eˆ2 + 3eˆ3, (7.1.1b) = ˆı + 2ˆ + 3kˆ. (7.1.1c)

81 Nitsche and Benner Unit 7. Linear Transformations

n with the standard bidis in R = {eˆ1,..., eˆn} = S or 1 2 = [x]S (7.1.2) 3 We can have another basis for R3;        1 1 2  1 , 1 , 0 = B (7.1.3)  0 1 0  This is linearly independent because the matrix 2 1 1 0 1 1 0 0 1 is nonsingular. So −1

[x]B =  3 (7.1.4) 1 − 2   c1 Now find c2 such that c3 1 1 2 c1 1 + c2 1 + c3 0 = 1eˆ1 + 2eˆ2 + 3eˆ3. (7.1.5) 0 1 0 In matrix form, 1 Bc = 2 , (7.1.6a) 3       1 1 2 c1 1 1 1 0 c2 = 2 . (7.1.6b) 0 1 0 c3 3 Solving for the individual variables,

c2 = 3; (7.1.7a)

c1 = 2 − c1, (7.1.7b) = −1; (7.1.7c)

2c3 = 1 − 3 + 1, (7.1.7d) 1 c = − . (7.1.7e) 3 2 Summary • For any vector space, V, there exists a basis B.

• Any x ∈ V is represented uniquely by a tuple of numbers, the coordinates [x]B.

82 7.1. Lecture 22: October 14, 2013 Applied Matrix Theory

Linear Transformations Definition 7.5. Given the vector spaces U, V, a map T : U → V such that, • T(x + y) = T(x) + T(y) • T(αx) = αT(x) is a linear transformation of U → V. We also recognize that a linear transformation is a linear function on vector spaces. Definition 7.6. A linear transformation U → U is a linear operator on U. Our goal now is two fold: • Show that the set of all linear transformations U → V is a vector space L(U, V). • Find the basis and coordinate unit basis of any T ∈ L(U, V).

Examples of Linear Functions

n m Example 7.7.T (x) = Am×nxn×1 so T : R → R . • Rotation A = R(θ) • projection • reflection Example 7.8. f(x) = ax, f : R → R df n n−1 1 Example 7.9. D(f) = dx , D : P → P or D : C → set of all functions. b 0 Example 7.10. I(f) = a f(x) dx, I : C → R ´ n×k Example 7.11. One final example regarding matrices, T(Bn×k) = Am×nBn×k, T : R → Rm×k.

Matrix representation of linear transformations Every linear transformation on finite dimensional spaces has a matrix representation. Sup- 0 pose T : U → V and B = {u1,..., un} forms the basis for U and B = {v1,..., vn} forms the basis for V. Then the action of T on U is n ! X T(u) = T ξiui , (7.1.8a) i=1 n X = ξiT (ui) , (7.1.8b) i=1 n n X X = ξi αijvj, (7.1.8c) i=1 j=1 n n X X = αijξivj, (7.1.8d) i=1 j=1

where αij describes the action of T.

83 Nitsche and Benner Unit 7. Linear Transformations

Theorem 7.12. The set of all linear transformations T : U, V = L(U, V) is a vector space.

Proof. Given T1, T2 ∈ L(U, V), then (T1 + T2) x = T1x + T2x and T1 + T2 ∈ L(U, V). Further (αT1)x = αT1(x) which gives αT1 ∈ L(U, V). Some other properties of note: 0x = 0 and 0 ∈ L(U, V); (T1 − T1) = 0, etc. 

0 Theorem 7.13. Given U with basis B = {u1,..., un} and V with basis B = {v1,..., vn} then a basis for L(U, V) is {Bij}i=1,...,n; j=1,...,m, where Bij : U → V by Bij(u) = ξivj where Pn u = k=1 ξkuk. It follows that dim(L(U, V)) = dim(U) dim(V) = nm. P Proof. Let’s prove linear independence: Consider ηijBij = 0, then ! X 0 = ηijBij (uk), (7.1.9a) ij X = ηij(Bijuk), (7.1.9b) ij X = ηkjvj (7.1.9c) j

| ASIDE: Note that Bij uk = ξivj = 0, i 6= k; vj , i = k With [uk]B = 0 ··· 1 ··· 0 , with the 1 at the kth position.

Since {vj} are linearly independent it follows that ηkj ≡ 0 for all j and each k. Therefore Bij are linearly independent. 

7.2 Lecture 23: October 16, 2013

The next major things we are going to try to cover are:

• Basis for L(U, V) coordinates for T ∈ L(U, V)

• Action of T

• Change of coordinates of u ∈ U under change of basis

• Change of coordinates of T ∈ L(U, V) under change of basis

Basis of a linear transformation The linear set, L(U, V) = {T : U → V | T linear transformation} (7.2.1)

Theorem 7.14.B ji : U → V by Bjiu = ξivj where B = {u1,..., un} is a basis for U and 0 Pn B = {v1,..., vn} is a basis for V and u = k=1 ξkuk. Also, {Bij} are basis for L(U, V).

84 7.2. Lecture 23: October 16, 2013 Applied Matrix Theory

Proof. First, we observe that we have linear independence. Second we check the span. If we let T ∈ L(U, V), then

X T(u) = T( ξjuj), (7.2.2a) j X = ξjT(uj), (7.2.2b) j m X X = ξj αijvi. (7.2.2c) j i=1

Pm Here we recognize that T(uj) = i=1 αijvi.

m X X = αij ξjvi , (7.2.2d) j i=1 |{z} Bij (u) m ! X X = αijBij (u). (7.2.2e) j i=1

P P for any u. Thus, T = j i αijBij; so {Bij} spans L(U, V). It follows that

[T]BB0 = {αij} , (7.2.3a)   α11 α12 α1n  α α ··· α   21 22 2n  =  . .. .  , (7.2.3b)  . . .  αm1 αm2 ··· αmn  = [T(u1)]B0 [T(u2)]B0 ··· [T(un)]B0 . (7.2.3c)



If T : U → U is a linear operator that goes to the same space then [T]BB = [T]B for convenience.

n n−1 dp n Example 7.15. Let D : P → P by D(p) = dx . Our basis is B = {1, x, . . . , x } and we

85 Nitsche and Benner Unit 7. Linear Transformations

also have the operated basis B0 = {1, x, . . . , xn−1}. So,

[D(1)]B0 = [0]B0 , (7.2.4a) 0 . = . ; (7.2.4b) 0

[D(x)]B0 = [1]B0 , (7.2.4c) 1 0   = . ; (7.2.4d) . 0  2  D(x ) B0 = [2x]B0 , (7.2.4e) 0 2   0 =   ; (7.2.4f) . . 0 n  n−1 [D(x )]B0 = nx B0 , (7.2.4g) 0  .   .  =   . (7.2.4h) 0 n

This allows us to represent the differentiation operator by the matrix,

0 1 0 0 0 0 0 2 0 ··· 0  .  [D] = 0 0 0 3 .. 0 . (7.2.5) BB0    . . . .   ......  0 0 0 0 ··· n n×(n+1)

n n dp Example 7.16. Let D : P → P by D(p) = dx . This will be the same as the previous example except we will add a row of zeros at the bottom and give us a square matrix.

0 1 0 0 0 0 0 2 0 ··· 0  .  0 0 0 3 .. 0 [D] =   . (7.2.6) B  . . . .   ......    0 0 0 0 ··· n 0 0 0 0 0 (n+1)×(n+1)

86 7.2. Lecture 23: October 16, 2013 Applied Matrix Theory

We may do this for any operator. For example we could do this for projection. What we want is to find a basis that gives us a nice representation of the operator. Highly sparse basis are nice.

Action of linear transform The action of T : U → V. Recall,

n ! X T(u) = T ξjuj , (7.2.7a) j=1 n X = ξjT (uj) , (7.2.7b) j=1 n m X X = ξj αijvi, (7.2.7c) j=1 i=1 n m ! X X = αijξj vi (7.2.7d) j=1 i=1 | {z } [Aξ]i This gives us the coordinates of the V basis.

[T(u)]B0 = Aξ, (7.2.8a)

= [T]BB0 [u]B . (7.2.8b) Thus the action is represented by matrix multiplication. Now return to our example,

n n−1 dp n Example 7.17. Let D : P → P by D(p) = dx . Our basis is B = {1, x, . . . , x } and we 0 n−1 n also have the operated basis B = {1, x, . . . , x }. If we consider p(x) = α0 +α1x+···+αnx n−1 and D(p(x)) = α1 + 2α2x + ··· + nαnx . This gives our vector representation of   α1 2α   2  [D(p)] = 3α3  , (7.2.9a) B0    .   .  nαn   0 1 0 0 0 α0 α1 0 0 2 0 ··· 0    .  α2  ..    = 0 0 0 3 0 α  . (7.2.9b)  . . . .   3  ......   .   .  0 0 0 0 ··· n αn

It follows that [L + T]BB0 = [L]BB0 + [T]BB0 and [αL]BB0 = α [L]BB0 . We may also consider the composition of linear operators. Say L(T(x)) = (LT)(x), also [LT]BB00 = [L]BB0 [T]B0B00 .

87 Nitsche and Benner Unit 7. Linear Transformations

Change of Basis

If we change the coordinates of our system when given vector space U. Let B = {u1,..., un} 0 is a basis for U and B = {v1,..., vn} be two bases for U. The relation between [u]B and [u]B0 is given by [u]B0 = P [u]B . (7.2.10) 0 P is called the change of basis matrix from B to B . Recall, the coordinates of [T(u)]B0 = [T(u)]BB0 [u]B. Clearly P is [T(u)]BB0 when T = I or P = [I(u)]BB0 . We will use our differentiation operator as an example once more.

Example 7.18. Given U = P2 we have the bases B = {1, x, x2} and B0 = {1, 1 + x, 1 + x + x2}, then  [I(u)]BB0 = [I(u1)]B0 [I(u2)]B0 [I(u3)]B0 , (7.2.11a)  = [u1]B0 [u2]B0 [u3]B0 , (7.2.11b) 1 −1 0 = 0 1 −1, (7.2.11c) 0 0 1 = P. (7.2.11d)

We know this is true for any u. We can find the representation of the polynomial p(x) = 2 3 + 2t + 4t in the [p]B0 . So, 1 −1 0 3

[p]B0 = 0 1 −1 2, (7.2.12a) 0 0 1 4  1 = −2. (7.2.12b) 4

0 Finally, let U be a vector space with basis B = {u1,..., un} and B = {v1,..., vn}.

Then if we have T : U → U. We know the relation between [T]B and [T]B0 and we may let P = [I]BB0 . We have,

[T(u)]B0 = [T]BB0 [u]B , (7.2.13a)

= A [u]B . (7.2.13b) Further we have

[u]B0 = P [u]B , (7.2.14a)

[T(u)]B0 = P [T(u)]B . (7.2.14b) So

P [T(u)]B = A ... (7.2.15) to be continued. . . Note: No class Friday.

88 7.3. Lecture 24: October 21, 2013 Applied Matrix Theory

7.3 Lecture 24: October 21, 2013

Change of Basis (cont.)

0 If we have T : U → U, let U be a vector space with basis B = {u1,..., un} and B = {v1,..., vn}. P 1. Basis for L(U, V) = {Bij : Biju = ξivj, where u = ξkuk} coordinates of T,[T] =  k [Tu1]B0 [Tu2]B0 ··· [Tun]B0 .

2. Achar of T [T(u)]B0 = [T]BB0 [u]B.

0 3. Given x ∈ U with B, B are two bases for U, then [x]B0 = P [x]B. and P = [I]BB0 .

0 4. T : U → U with B, B are two bases for U, then we want to relate [T]B and [T]B0 .

To show property 4,

[Tu]B0 = [T]B0B0 [u]B0 , (7.3.1a)

[Tu]B = [T]BB [u]B . (7.3.1b)

But also,

[Tu]B0 = P [Tu]B , (7.3.2a)

Considering P = [I]BB0

[u]B0 = P [u]B . (7.3.2b)

So

P [Tu]B = ··· (7.3.3a)

And we get,

−1 [T]BB = P [T]B0B0 P, (7.3.4a) −1 [T]B = P [T]B0 P. (7.3.4b)

The matrix representation of T under different basis are self similar.

Definition 7.19. If A = C−1BC for some C, then A and B are self-similar (A, B, C ∈ Rn×n).

Theorem 7.20. Given any two self-similar matrices A, B, they represent the same linear transformation under two different bases.

89 Nitsche and Benner Unit 7. Linear Transformations

−1 Example 7.21. Example illustrating the self-similarity: [T]B = P [T]B0 P. Let T ∈ L(U, U) be defined by  0 1 x Tu = (7.3.5) −2 3 y where u = xu1 + yu2.

 y  Tu = = yu + (−2x + 3y)u . (7.3.6) −2x + 3y 1 2

In basis notation we may consider this,

[Tu]B = M [u]B . (7.3.7)

1 1 Now let’s consider a different basis. Let S = {eˆ , eˆ } and S0 = , . Now 1 2 1 2  [T]S = [Teˆ1]S [Teˆ2]S , (7.3.8a)  0 1  = , (7.3.8b) −2 3 S S  0 1 = , (7.3.8c) −2 3 = M. (7.3.8d)

Now in our different basis,

 1  1  [T] = T T , (7.3.9a) S0 1 2 S S 1 2  = , (7.3.9b) 1 4 S0 S0 1 0 = . (7.3.9c) 0 2

This helps us by diagonalizing the operator. Now we want to find P,

P = [I]BB0 , (7.3.10a)  = [Tu1]B0 [Tu2]B0 , (7.3.10b) 1 0  = , (7.3.10c) 0 1 B0 B0  2 −1 = . (7.3.10d) −1 1

Similarly, 1 1 P−1 = . (7.3.11) 1 2

90 7.4. Lecture 25: October 23, 2013 Applied Matrix Theory

We can verify this, 1 1 1 0  2 −1 P−1 [T] P = , (7.3.12a) S0 1 2 0 2 −1 1 1 1  2 −1 = , (7.3.12b) 1 2 −2 2  0 1 = . (7.3.12c) −2 3

So this checks out.

Example 7.22. Let M ∈ L(U, V) defined by [M(u)]S = M [u]S where S is the standard basis. Then

[M]S = M, (7.3.13a)  = [Meˆ1]S [Meˆ2]S ··· [Meˆn]S (7.3.13b)

0 and we define S = {q1,..., qn}. When we have Q = [I]S0S ,

−1 [M]S0 = Q MQ, (7.3.14a)  = [q1]S [q2]S ··· [qn]S , (7.3.14b)  = q1 q2 ··· qn . (7.3.14c) .

−1 0 Now let A = Q BQ with S = {eˆ1,..., eˆn} and S = {q1,..., qn} and Let L(u) = Bu. −1 [L]S = B and [I]S0S = Q so [L]S0 = Q BQ. If T ∈ L(U, U) and X ⊂ U such that T(X ) ⊂ X where T(X ) = {T(x) such that x ∈ X } then X is an invariant subspace of U under T. Example 7.23. If (λ, v) are an eigen-pair of A then

(λI − A) v = 0, (7.3.15a) λv = Av. (7.3.15b)

and span{v} is an invariant subspace under A.

7.4 Lecture 25: October 23, 2013

Properties of Special Bases If we consider B and B0 as bases for U with operation T : U → U. Then we have,  [T]BB0 = [T(u1)]B0 ··· [T(un)]B0 , (7.4.1a)  [T]B = [T(u1)]B ··· [T(un)]B , (7.4.1b) −1 = P [T]B0 P (7.4.1c)

91 Nitsche and Benner Unit 7. Linear Transformations

n And we also have P = (I)BB0 . We consider T on R , T(x) = Ax and [T]S = A. So −1 A = P BP for appropriate B and P, with B = [T]B0 Note: A tuple is an ordered set of numbers. Now we have two goals:

1. Find a basis such that [T]B is simple 2. FInd invariant quantities Example 7.24. tr(P−1BP) = tr(BPP−1) = tr(B)

n n Example 7.25. For T : P → P by T(p) = Dp, 0 1 0 ··· 0 .  .  0 0 2 .  [T] =   (7.4.2) B   .  . 0 n 0 ··· 0 0 tr(T) = 0 Example 7.26. rank(P−1BP) = rank(B) Example 7.27. Nilpotent operator of index k N : U → U such that Nk = 0, but Nk−1 6= 0.  2 k−1 k On the homework we will have to show that x, Nx, N x,..., N x a basis for R and x is defined such that Nk−1(x) 6= 0. So, 0 0 ··· 0 . . 1 0 .. . [N] =   = J (7.4.3) B . .  . .. 0 0 0 ··· 1 0

Example 7.28. An idempotent operator E : U → U has the property E2 = E. This is because these are projection operators which can only return the same answer if done twice.     B = x1,..., xr, y1,..., yn−r . | {z } | {z } BR(E) BN(E) I 0 [E] = r×r (7.4.4) 0 0

Example 7.29. If A has a full set of eˆ-vectors qj, j = 1, . . . , n. Then, Aqj = λjqj with bases S, P. So  [I]PS = q1,..., qn , (7.4.5a) = Q, (7.4.5b) −1 [T]P = Q [T]S Q, (7.4.5c) Λ = Q−1AQ (7.4.5d)

92 7.4. Lecture 25: October 23, 2013 Applied Matrix Theory

So   λ1 0 ··· 0 . .  0 λ .. .  [T] =  2  , (7.4.6a) P  . . .   . .. .. 0  0 ··· 0 λn T(x) = Ax. (7.4.6b)

Invariant Subspaces Let T be a linear operator T : U → U.

Definition 7.30. A subset X ⊂ U is invariant under T if Tx ∈ X for any x ∈ X (or T(X ) ⊂ X ). Also T1x : X → X .

Example 7.31. Given

T(x) = Ax, (7.4.7a) −1 −1 −1 −1  0 −5 −16 −22 =  . (7.4.7b)  0 3 10 14 4 8 12 14

 2  −1 −1  2 X = span {q , q } where q1 =   and q2 =  . Show that X is invariant under T. 1 2  0  −1 0 0 So,

 −1 −15 T(q ) =  , (7.4.8a) 1  −3 0

= q1 + 3q2 ∈ span(X ); (7.4.8b)  0  6 T(q ) =  , (7.4.8c) 2 −4 0

= 2q1 + 4q2 ∈ span(X ). (7.4.8d)

4 4 So for any T(α1q1+α2q2) = α1T(q1)+α2T(q2) ⊂ X . So for T : R → R with T1x : X → X ,

1 2 [T1x] = . (7.4.9) q1,q2 3 4

93 Nitsche and Benner Unit 7. Linear Transformations

 1 0    0 0 Now say we have [T]P , P = q1, q2,   ,   . Then,  0 0  0 1 

1 2 x x 3 4 x x [T] =   (7.4.10) P 0 0 −1 x 0 0 4 x

So we have gained some zero elements. This is since,

1 −1 0  0 T   =   , (7.4.11a) 0  0 0 4 0  −1 0 −12 T   =   (7.4.11b) 0  14 1 14

Now if X , Y are subspaces of U and are invariant under T; T(X ) ⊂ X and T; T(Y) ⊂ Y.   and X + Y = U. Then B = x1,..., xr, y1,..., yn−r .    [T]B = [T(x1)]B ··· [T(xr)]B [T(y1)]B ··· T(yn−r) B , (7.4.12a)   [T1x] 0 = Bx , (7.4.12b) 0 [T ] 1y By = Q−1AQ. (7.4.12c)

7.5 Homework Assignment 5: Due Monday, November 4, 2013

1. Explain how we proved in class that, for any A ∈ Rm×n, the linear A|Ax = Ab is consistent. Do not reproduce all proofs, but outline the train of thought, starting from basic linear algebra facts. 2. For the overdetermined linear system

1 2 1 1 2 x = 1 1 2 2

(a) Is the matrix A rank-deficient or of full rank? What is the rank of A|A? (b) Find all least squares solutions.

94 7.5. HW 5: Due November 4, 2013 Applied Matrix Theory

(c) Find the solution that Matlab returns, using A\b. Also find the least squares solution of minimum norm. Do they agree? (d) What criterion does Matlabs use to choose a solution? (use help mldivide to find out)

3. Textbook 4.7.2: Linear transformations For A ∈ Rn×n, determine which of the following functions are linear transformations.

(a) T(Xn×n) = AX − XA,

(b) T(xn×1) = Ax + b for b 6= 0, (c) T(A) = A|, | (d) T(Xn×n) = (X + X ) /2.

4. Textbook 4.7.6 2 2 For the operator T : R → R defined by T(x, y) = (x + y, −2x + 4y), determine [T]B, 1 1 where B is the basis B = , . 1 2 5. Textbook 4.7.11 Let P be the projector that maps each point v ∈ R2 to its orthogonal projection on the line y = x as depicted in Figure 4.7.4.

Figure 7.1. Figure 4.7.4

(a) Determine the coordinate matrix of P with respect to the standard basis. α (b) Determine the orthogonal projection of v = onto the line y = x. β

6. Textbook 4.7.13

For P2 and P3 (the spaces of polynomials of degrees less than or equal to two and three, t respectively), let S : P2 → P3 be the linear transformation defined by S(p) = 0 p(x) dx. 2 0 2 3 Determine [S]BB0 , where B = {1, t, t } and B = {1, t, t , t }. ´

95 Nitsche and Benner Unit 7. Linear Transformations

7. Textbook 4.8.1: Change of basis Explain why rank is a similarity invariant. 8. Textbook 4.8.2 Explain why similarity is transitive in the sense that A ' B and B ' C implies A ' C. 9. Textbook 4.8.3

A(x, y, z) = (x + 2y − z, −y, x + 7z) is a linear operator on R3.

(a) Determine [A]S , where S is the standard basis. −1 (b) Determine [A]S0 as well as the nonsingular matrix Q such that [A]S0 = Q [A]S Q        1 1 1  for S0 = 0 , 1 , 1 .  0 0 1 

10. Textbook 4.8.11

(a) N is nilpotent of index k when Nk = 0 but Nk−1 6= 0. If N is a nilpotent operator of index n on Rn, and if Nn−1(y) 6= 0, show B = {y, N(y), N2(y),..., Nn−1(y)} is a basis for Rn, and then demonstrate that?

0 0 ··· 0 0 1 0 ··· 0 0   [N] = J = 0 1 ··· 0 0 B   ......  . . . . . 0 0 ··· 1 0

(b) If A and B are any two n × n nilpotent matrices of index n, explain why A ' B. (c) Explain why all n × n nilpotent matrices of index n must have a zero trace and be of rank n − 1.

11. Textbook 4.8.12

2 n r E is idempotent when E = E. For an idempotent operator E on R , let X = {xi}i=1 n−r and Y = {xi}i=1 be bases for R(E) and N(E), respectively.

n (a) Prove that B = X ∪ Y is a basis for R . Hint: Show Exi = xi and use this to deduce that B is linearly independent. I 0 (b) Show that [E] = r . B 0 0 (c) Explain why two n × n idempotent matrices of the same rank must be similar. (d) If F is an idempotent matrix, prove that rank(F) = tr(F).

96 7.5. HW 5: Due November 4, 2013 Applied Matrix Theory

12. Textbook 4.9.3: Invariant subspaces Let T be the linear operator on R4 defined by

T(x1, x2, x3, x4) = (x1 + x2 + 2x3 − x4, x2 + x4, 2x3 − x4, x3 + x4),

and let X = span {eˆ1, eˆ2} be the subspace that is spanned by the first two unit vectors in R4. (a) Explain why X is invariant under T.   (b) Determine T/X . {eˆ1,eˆ2}

(c) Describe the structure of [T]B, where B is any basis obtained from an extension of {eˆ1, eˆ2}. 13. Textbook 4.9.4 Let T and Q be the matrices

−2 −1 −5 −2  1 0 0 −1 −9 0 −8 −2  1 1 3 −4 T =   and Q =    2 3 11 5 −2 0 1 0 3 −5 −13 −7 3 −1 −4 3

(a) Explain why the columns of Q are a basis for R4. (b) Verify that X = span {Q:1, Q:2} and Y = span {Q:3, Q:4} are each invariant sub- spaces under T. (c) Describe the structure of Q−1TQ without doing any computation. (d) Now compute the product Q−1TQ to determine     T/X and T/Y . {Q:1,Q:2} {Q:3,Q:4}

14. Textbook 4.9.7 If A is an n × n matrix and λ is a scalar such that (A − λI) is singular (i.e., λ is an eigenvalue), explain why the associated space of eigenvectors N(A − λI) is an invariant subspace under A. 15. Textbook 4.9.8  −9 4 Consider the matrix A = . −24 11

(a) Determine the eigenvalues of A. (b) Identify all subspaces of R2 that are invariant under A. (c) Find a nonsingular matrix Q such that Q−1AQ is a diagonal matrix.

97 Nitsche and Benner Unit 7. Linear Transformations

98 UNIT 8

Norms

8.1 Lecture 26: October 25, 2013

Homework 5 due Friday

Difinition of norms

Norm acts on a vector space V over R or C.

Definition 8.1. A norm is a function k · k : V → R by : x → kxk such that 1. kxk ≥ 0 for any x ∈ V, and kxk = 0 if and only if x = 0

2. kαxk = |α|kxk

3. kx + yk ≤ kxk + kyk

Vector Norms Some norms:

pPn 2 • kxk2 = i=1 xi which is the 2-norm or the Euclidean norm Pn • kxk1 = i=1 |xi| Pn p 1/p • kxkp = ( i=1 xi )

• kxk∞ = maxi |xi| = limp→∞ kxkp

The two norm

x 2 2 A unit vector is kxk and the unit ball in R {x ∈ R : kxk = 1} We illustrate the unit balls for the three primary norms: kxk2 = 1 which gives a circle, kxk1 = 1 or |x1| + |x2| = 1 which gives a rhombus, kxk∞ = 1 or (x1, x2) such that max(|x1|, |x2|) = 1 which gives a square.

99 Nitsche and Benner Unit 8. Norms

Theorem 8.2. kxk∞ ≤ kxk2 ≤ kxk1 Proof.

kxk = max |xi|, (8.1.1a) ∞ i q 2 = max xi , (8.1.1b) i q 2 = xk , for some k, (8.1.1c) v u n uX 2 ≤ t xi , (8.1.1d) i=1

= kxk2; (8.1.1e) qX 2 = |xi| , (8.1.1f) r X 2 ≤ |xi| , (8.1.1g)

= kxk1. (8.1.1h)

 2 P 2 Our goal is now to prove the triangle inequality for the 2-norm. Note that kxk2 = xi = x|x, where x|y is the standard inner product. Theorem 8.3. The Cauchy–Schwarz inequality (or CBS): |x|y| ≤ kxkkyk

x|y | | Proof. Let α = x|x ; note x y = y x. Also, x|y  x| (αx − y) = x| x − y , (8.1.2a) x|x x|y = x| x − x|y, (8.1.2b) x|x x|y = x|x − x|y, (8.1.2c) x|x = x|y − x|y, (8.1.2d) = 0. (8.1.2e) Further,

2 | 0 ≤ kαx − yk2 = (αx − y) (αx − y) , (8.1.3a) = αx (αx − y) − y (αx − y) , (8.1.3b) = −αy|x + y|y, (8.1.3c) x|y = − y|x + y|y, (8.1.3d) x|x | |x y| 2 = − 2 + kyk2. (8.1.3e) kxk2

100 8.1. Lecture 26: October 25, 2013 Applied Matrix Theory

| 2 |x y| 2 2 | 2 this gives kyk ≥ 2 and therefore kxk kyk ≥ |x y| .  kxk2

Theorem 8.4. kx + yk2 ≤ kxk2 + kyk2 Proof.

2 | kx + yk2 = (x + y) (x + y) , (8.1.4a) = x| + y| (x + y) , (8.1.4b) = x|x + 2x|y + y|y, (8.1.4c) | ≤ kxk2 + 2 x y + kyk2, (8.1.4d)

≤ kxk2 + 2kxk2kyk2 + kyk2, (8.1.4e) 2 = (kxk2 + kyk2) , (8.1.4f) q 2 kx + yk2 ≤ (kxk2 + kyk2) , (8.1.4g)

≤ kxk2 + kyk2. (8.1.4h)



Matrix Norms

Definition 8.5. A is a function k · k : Rn×m → R such that,

1. kAk ≥ 0 for any A ∈ Rn×m, and kAk = 0 if and only if A = 0 2. kαAk = |α|kAk

3. kA + Bk ≤ kAk + kBk

The Frobenius Norm The Frobeius norm is defined s X 2 kAkF = aij . (8.1.5) i,j or

2 X 2 kAkF = kAi,:k2, (8.1.6a) i X 2 = kA:,jk2, (8.1.6b) j X | = aj aj, (8.1.6c) j = tr(A|A). (8.1.6d)

which gives us a convenient way of expressing this norm.

101 Nitsche and Benner Unit 8. Norms

Induced Norms

Given a vector norm on Rn we may define (where sup is the smallest upper bound)

kAxk kAk = sup = sup kAxk. (8.1.7) x∈Rn kxk kxk=1 we may also replace the smallest upper bound (sup) with the maximum (max). We can now take kAk2, kAk1, and kAk∞

8.2 Lecture 27: October 28, 2013

Matrix norms (review)

Definition 8.6. A norm on V

1. kAk ≥ 0 for any A ∈ Rn×m, and kAk = 0 if and only if A = 0

2. kαAk = |α|kAk

3. kA + Bk ≤ kAk + kBk

Frobenius Norm

The Frobenius norm is defined

2 X 2 kAkF = |aij| , (8.2.1a) i X 2 = kAi,:k2, (8.2.1b) i X 2 = kA:,jk2, (8.2.1c) j = tr(A|A), (8.2.1d) = tr(A?A) (8.2.1e) for A ∈ Cn×m. In in the real set A? = A|. Properties of the Frobenius norm:

1. kAxk2 = kxk2kAkF

2. kABkF = kAkFkBkF

102 8.2. Lecture 27: October 28, 2013 Applied Matrix Theory

Proof. Property (1):

X 2 kAxk2 = (Ax)i , (8.2.2a) i X 2 = (Ai,:x) , (8.2.2b) i X 2 2 ≤ kAi,:k2kxk2, (8.2.2c) i 2 X 2 = kxk2 kAi,:k2 . (8.2.2d) i

| {z2 } kAkF

Property (2):

2 X 2 kABkF = k(AB)j,:k2, (8.2.3a) j X 2 = kABj,:k2, (8.2.3b) j X 2 2 ≤ kAkFkBj,:k2, (8.2.3c) j 2 X 2 = kAkF kBj,:k2 . (8.2.3d) j

| {z2 } kBkF

 Example 8.7. 1 2 A = (8.2.4) 0 2

1 0 1 2 A|A = , (8.2.5a) 2 2 0 2 1 2 = (8.2.5b) 2 8

So p kAk = tr(A|A) , (8.2.6a) 2 √ = 9 , (8.2.6b) = 3. (8.2.6c)

which may be called by norm(A, ’fro’) in Matlab.

103 Nitsche and Benner Unit 8. Norms

Induced Matrix Norms

Definition 8.8. For A ∈ Rn×m the induced norm of the matrix is kAxk kAk = max , (8.2.7a) x∈Rn kxk = max kAxk (8.2.7b) kxk=1

Example 8.9. 1 2 A = (8.2.8) 0 2

kAk1 = max kAxk1, (8.2.9a) kxk=1 X = max |(Ax)|. (8.2.9b) kxk=1

This provides a remap of the vector x. For example we may find the image of the points of the corners of the unit rhombus for the x vector. Which can provide a way to find the 1-norm, but this is not the best physically. Returning to the ∞-norm,

kAk∞ = max kAxk∞, (8.2.10a) kxk∞=1

= max max |(Ax)i|. (8.2.10b) kxk∞=1 i

Here we can remap the corners of the unit square to a stretched parallelogram. What is the maximum ∞-norm? From the figure, we can see it is 3. Now we are interested in the mapping of the 2-norm, which is the unit circle.

kAk2 = max kAxk2, (8.2.11a) kxk2=1 ≈ 2.92. (8.2.11b)

ASIDE: Say we have,

2 2 2 2 (Ax)1 + (Ax)2 = (a11x1 + a12x2) + (a21x1 + a22x2) , (8.2.12a) 2 2 2 2 = a11x1 + x1x2 (a11a12 + a21a22) + a22x2, (8.2.12b) = constant. (8.2.12c)

which would give an ellipse.

P Theorem 8.10. kAk = maxj |aij| which gives the maximum column-sum, and kAk = P 1 i 1 maxi j |aij| which is the maximum row-sum.

104 8.2. Lecture 27: October 28, 2013 Applied Matrix Theory

Properties The induced norms of a matrix have similar properties to the Frobenius norm:

kAxk 1. kAxk ≤ kAkkxk since kxk ≤ kAk 2. kABk = kAkkBk (Will be shown in the homework) Example 8.11. The induced norm of the identity matrix is 1; kIk = 1. Proof. X kAk1 = max |(Ax)|, (8.2.13a) kxk =1 1 i | {z } kAxk1

X X = max aijxj , (8.2.13b) kxk =1 1 i j X X ≤ max |aij||xj|, (8.2.13c) kxk =1 1 i j X X = max |xj| |aij|, (8.2.13d) kxk =1 1 i j X X ≤ max |xj| |aij| , (8.2.13e) kxk =1 1 i j | {z } independent of j X X = max max |aij| |xj|, (8.2.13f) kxk =1 j 1 i j | {z } kxk1=1 X = max |aij|. (8.2.13g) j i P P Now find an x such that the upper bound is attained. So let k = i |aik| = maxj i |aij|. Now let x = eˆk, then

kAxk1 = kAeˆkk, (8.2.14a)

= kA:kk1, (8.2.14b) X = |aij|, (8.2.14c) i

= max |aij|, (8.2.14d) j = upper bound. (8.2.14e)

 2 2 2 2 | | Further kAk2 = max kAxk2 such that kxk2 = 1. Then, kAk2 = max (x A Ax) such that x|x = 1. This arrizes Lagrange multipliers, or ∇f = λ∇g.

105 Nitsche and Benner Unit 8. Norms

8.3 Lecture 28: October 30, 2013

The 2-norm

Given the 2-norm kAk = max kAxk we have, f(x) = kAk2 = max (x|A|Ax) such 2 kxk2=1 2 2 that g(x) = x|x = 1 where f(x): Rn → R. This needs Lagrange multipliers, or ∇f = λ∇g. For a minimization problem. ∂UV ∂U ∂V = V + U (8.3.1) ∂xj ∂xj ∂xj

Lemma 8.12. If B is symmetric, ∇ (x|Bx) = 2Bx

Note: ∇ (x|x) = 2x

Proof. To prove this lemma,

∂ ∂ ∂x x|Bx = x|Bx + x|B , (8.3.2a) ∂xj ∂xj ∂xj | | = eˆj Bx + x Beˆj, (8.3.2b) | | | = eˆj Bx + x Beˆj , (8.3.2c) = eˆ| B x + eˆ|B|x, (8.3.2d) j |{z} j =B| | = 2eˆj Bx, (8.3.2e)

= 2 (Bx)j . (8.3.2f)



Proof. Alternatively, we may consider, ! ! ∂ X ∂ X X xi (Bx)i = xi Bikxk , (8.3.3a) ∂xj ∂xj i i k ! ∂ X X = xiBikxk , (8.3.3b) ∂xj i k X X = Bjkxk + xiBij, (8.3.3c) k i X X = Bjkxk + Bkjxk, (8.3.3d) k k X X = Bjkxk + Bjkxk, (8.3.3e) k k

= 2 (Bx)j . (8.3.3f)



106 8.3. Lecture 28: October 30, 2013 Applied Matrix Theory

So, 2A|Ax = 2x, (8.3.4a) A|Ax = λx, (8.3.4b) and the solution (λ, x) is an eigenpair of A|A. Note, for these x, f(x) = x|A|Ax = x|λx = λx|x = λ. Thus, max(f) = λmax = max λk (8.3.5) k | | and λk = eigenvalue of A A. Note further that A A is symmetric so the eigenvalues are real and therefore f(x) ≥ 0 and λk ≥ 0. Example 8.13. Given 1 2 A = (8.3.6) 0 2 and 1 2 A|A = . (8.3.7) 2 8 Then,

|  1 − λ 2 det A A − λI = , (8.3.8a) 2 8 − λ = (1 − λ) (8 − λ) − 4, (8.3.8b) = λ2 − 9λ + 4. (8.3.8c) So, √ 9 ± 81 − 16 λ1,2 = , (8.3.9a) √ 2 9 ± 65 = , (8.3.9b) 2 and √ 9 + 65 λ = . (8.3.10) max 2 Therefore: √ 9 + 65 kAk = ≈ 2.9208 ... (8.3.11) 2 2

Now, kxk∞ ≤ kxk2 ≤ kxk1. This inequality does not hold for matrices. Some properties, (where U|U = I and V|V = I) | • kAk2 = kA k2 | 2 • kA Ak2 = kAk2 A 0 = max (kAk , kBk ) • 0 B 2 2 2 | • kU AUk2 = kAk2 −1 √ 1 • kA k2 = λmin(A|A)

107 Nitsche and Benner Unit 8. Norms

108 UNIT 9

Orthogonalization with Projection and Rotation

9.1 Lecture 28 (cont.)

Inner Product Spaces An inner product space V plus the the inner product.

Definition 9.1. Given a vector space V, an inner product is a function f : V × V → R or C by f(x, y) = hx, yi such that

• hx, yi = hy, xi

• hx, αyi = α hx, yi, note hx, αyi = hy, xαi = α hy, xi = αhy, xi = α hx, yi

• hx + z, yi = hx, yi + hz, yi

• hx, xi ≥ 0 for any x ∈ V

• hx, xi = 0 implies x = 0

Example 9.2. 1. hx, yi = x|y with V = Rn and hx, yi = x∗y with V = Cn, where x∗ = x|.

| | n ∗ ∗ n 2. hx, yiA = x A Ay with V = R and hx, yiA = x A Ay with V = C √ | | This gives us a new norm kxkA = x A Ax = kAxk2. q q b 0 b b 2 3. hf, gi = a f(x)g(x) dx, V = C [a, b] and kfk = a f(x)f(x) dx = a |f(x)| dx ´ ´ ´ b 4. hf, gi = a ω(x)f(x)g(x) dx where ω(x) ≥ 0 ´ | p | 5. hA, Bi = tr(A B) and kAk = tr(A A) = kAkF

109 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

9.2 Lecture 29: November 1, 2013

Inner Product Spaces Reviewing properties of inner product spaces,

• hx, yi = hy, xi

• hx, αyi = α hx, yi

• hx + z, yi = hx, yi + hz, yi

• hx, xi ≥ 0 for any x ∈ V

• hx, xi = 0 implies x = 0 Now we may define norms kxk = phx, xi . Let’s say we want to define angles between vectors and ky − xk2 = kxk2 + kyk2 − 2kxkkyk cos(θ). Rearranged,

−ky − xk2 + kxk2 + kyk2 cos(θ) = , (9.2.1a) 2kxkkyk hx, xi + hy, yi − hy − x, y − xi = , (9.2.1b) 2kxkkyk hy, xi + hx, yi = , (9.2.1c) 2kxkkyk hx, yi = , (9.2.1d) kxkkyk

only if hx, yi ∈ R. For a more general definition hy, xi + hx, yi = hy, xi + hy, xi = 2 Re(hy, xi). So we would have the problem of the conjugate in finding the angle, but have reduced this issue.

Definition 9.3. The angle between x, y is given by

hx, yi cos(θ) = . (9.2.2) kxkkyk

So, for x ⊥ y means hx, yi = 0.

Note: If the inner product is not a , then hx, yi = 0 means kxk2 + kyk2 = ky − xk2, but not vice-versa.

Example 9.4.  1  4 −2  1 x =   and y =   .  3 −2 −1 −4

110 9.2. Lecture 29: November 1, 2013 Applied Matrix Theory

| | | So x ⊥ y in hx, yi = x y, but x 6⊥y in hx, yiA = x A Ay where,

1 2 0 0 0 1 0 0 A =   . 0 0 1 0 0 0 0 1

Definition 9.5. A set {u1,..., un} is orthonormal if kukk = 1 for any k and huj, uki = 0 for any j 6= k.

Fourier Expansion

Given an orthonormal basis for V we can write x ∈ V as

x = c1u1 + c2u2 + ··· cnun (9.2.3) with hx, uji = cj huj, uji = cj.

n on √1 Example 9.6. Given a series π sin(kx) is orthonormal with respect to the inner k−1 π product −π f(x)g(x) dx. ´ 1−cos(2kx) How do we compute the following integrals? sin(kx) dx = 2 dx So if f ∈ n π √1 P √1 span {sin(kx)} then f = π k=1 ck sin(kx). Thus,´ck = π −π f(x´) sin(kx) dx. In homework we will approximate a line on [−π, π] with the´ sine and cosine Fourier series. This is essentially the 2-norm approximation of the span of the Fourier series. The Gibbs phenomena will be observed with overshoot of the sines and cosines above the function. Thus, orthonormal bases are useful for partial differential equations applications.

Orthogonalization Process (Gramm-Schmidt)

Goal: Given basis {a1,..., an} find an orthonormal basis {u1,..., un} for V. This is the orthogonalization process. Method: find uk such that span {u1,..., un} = span {a1,..., an} for k = 1, . . . , n. Now let’s show the process. k = 1: a1 u1 = ka1k

k = 2: a2 − hu1, a2i u1 u2 = ka2 − hu1, a2i u1k

111 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

As an example of the orthogonality of u1 and u2 a − hu , a i u hu , a − hu , a i u i 2 1 2 1 = 1 2 1 2 1 , (9.2.4a) ka2 − hu1, a2i u1k `2 1 = hu1, a2 − hu1, a2i u1i , (9.2.4b) `2 1 = [hu1, a2i − u1 hu1, a2i u1] , (9.2.4c) `2   1 = hu1, a2i − hu1, a2i hu1, u1i , (9.2.4d) `2 | {z } 1 = 0. (9.2.4e)

k = 3: ... k = k: ak − hu1, aki u1 − hu2, aki u2 − · · · − huk−1, aki uk−1 uk = . kak − hu1, aki u1 − hu2, aki u2 − · · · − huk−1, aki uk−1k This is the Gramm–Schmidt orthogonalization process. If we want, we can write it as, ∗ (I − Uk−1Uk) ak uk = ∗ . (9.2.5) k(I − Uk−1Uk) akk Here  | |  Uk−1 = u1 ··· uk−1. | |

9.3 Lecture 30: November 4, 2013

Gramm–Schmidt Orthogonalization

Given basis {a1,..., an} find an orthonormal basis {u1,..., un} for that spans the same space. Algorithm,

a1 u1 = , (9.3.1a) ka1k a2 − (u1a2) u1 u2 = , (9.3.1b) `2 (9.3.1c) with using projections,

|  hu1, a2i u1 = u1a2 u1, (9.3.2a) | = u1u1 a2. (9.3.2b) |{z} P11

112 9.3. Lecture 30: November 4, 2013 Applied Matrix Theory

From,

a2 − (u1a2) u1 u2 = , (9.3.3a) ka2 − (u1a2) u1k | (I − u1u1) a2 = | , (9.3.3b) k(I − u1u1) a2k

= P⊥a2. (9.3.3c)

Example 9.7. Given the vectors,

0 −20 −14 a1 = 3 , a2 =  27 , and a3 =  −4 4 11 −2

Then we can find the orthogonal vectors,

0 1 u = 3 . (9.3.4) 1 5   4

Then,

v2 = a2 − hu1, a1i u1, (9.3.5a) −20 −20 0 1 1 = 27 − 0 3 4 27 3 , (9.3.5b)   5     5 11 11 4 −20 0 125 = 27 − 3 ··· (9.3.5c)   25   11 4

*** and

0 1 u = 3 , (9.3.6a) 1 5   4 −20 1 u = −12 , (9.3.6b) 2 25   −9 −15 1 and u = −16 . (9.3.6c) 3 25   12

113 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

Now rewriting our system,

a1 u1 = , (9.3.7a) `1 a2 − r12u1 u2 = , (9.3.7b) `2 a3 − r13u1 − r23u2 u3 = , (9.3.7c) `3 ··· (9.3.7d)

an − r1nu1 − r2nu2 − · · · − rn−1,nun−1 un = . (9.3.7e) `n where rij = hai, uji. Now in different vector form,

a1 = `1u1, (9.3.8a)

a2 = r12u1 + `2u2, (9.3.8b)

a3 = r13u1 + r23u2 + `3u3, (9.3.8c) ··· (9.3.8d)

an = r1nu1 + r2nu2 + ··· + rn−1,nun−1 + `nun. (9.3.8e)

We can put this in a matrix form. If A is full rank (must have m ≥ n). Since can have at most m linearly independent vectors ai. With A = QR,   `1 r12 r13 r1n      0 `2 r23 ··· r2n | | | | | |    ...  a1 a2 ··· an = u1 u2 ··· un  0 0 `3 r3n . (9.3.9) | | | | | |  . . . .  m×n m×n  ......  0 0 0 ··· ` n n×n where rii = `i 6= 0 > 0 and R is invertible. This uniquely determines the Fourier coefficients of the Fourier expansion of this system. Thus, every matrix A of full rank has a unique decomposition, known as a QR fac- | torization, Am×n = Qm×nRn×n, where R is invertible. What do we know about Q Q? | | | (Q Q)ij = ui uj which is zero for i 6= j and one for i = j. So Q Q = In×n. These are orthogonal matrices. Decompositions of A:

| • Am×n = Qm×nRn×n, where Q Q = I and R is invertible.

• A = LU if |Ak|= 6 0.

• PA = LU always exists.

Now what about QQ|? It will be an m × m matrix, but otherwise we know little about it.

114 9.4. Lecture 31: November 6, 2013 Applied Matrix Theory

Example 9.8. Returning to our example,       0 −20 −14 0 −20/25 −15/25 5 25 r13 3 27 −4 = 3/5 12/25 −16/25 0 `2 r23 (9.3.10) 4 11 −2 4/5 −9/25 12/25 0 0 `3

In this case Q has three linearly independent columns and three linearly independent rows. So Q| has linearly independent columns. And interestingly (Q|)| Q| = QQ| = I. This is an : it is both invertible and has orthogonal columns. In general this is not the case because it is not n × n and QQ| is not necessarily the identity if m > n.

Use A = QR:

Example 9.9. Assume An×n invertible; solve Ax = b. Rewrite

QRx = b, (9.3.11a) Q|QRx = Q|b, (9.3.11b) Rx = Q|b. (9.3.11c)

This system is quick to solve (once Q and R are known).

Example 9.10. Assume Am×n full rank m > n then Ax = b is an overdetermined system and least squares solution satisfies,

A|Ax = A|b, (9.3.12a) R|Q|QRx = R|Q|b, (9.3.12b) R|Rx = R|Q|b, (9.3.12c) Rx = R|−1 R|Q|b, (9.3.12d) Rx = Q|b. (9.3.12e)

Go through this proof and the solutions manual. Then we will see how well SVD can improve things later.

9.4 Lecture 31: November 6, 2013

In homework the reduced QR factorization reffered to is where we can always write Am×n = | Qm×nRn×n where Q Q = I and Rn×n is triangular. This factorization is unique, but we may also x x ··· x  | |  . 0 x .. x QR = q ··· q   (9.4.1)  1 n  ......  | |  . . . .  0 0 ··· x

115 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

m since {q1,..., qn} is an orthogonal basis for R(A) ⊂ R . Now, x x ··· x 0 x ··· x  . . . .     ......  | | | |  . .    QR = q1 ··· qn qn+1 ··· qm 0 0 ··· x (9.4.2) | | | | 0 0 ··· 0 m×m    .   .  0 0 ··· 0 m×n In this case, the reduced QR is not unique.

Unitary (orthogonal) matrices The unitary refers to the complex case and the orthogonal refers to the real.

Definition 9.11. A is Q ∈ Cn×n such that Q∗Q = I. This means we have Q has n orthogonal columns. Additionally, since Q is square we have n orthogonal rows. n×n | | So, Q ∈ R , with Q Q = QQ = In×n.

Properties Some properties for a unitary Q:

∗ ∗ • Q Q = QQ = In×n

• Q−1 = Q∗

• columns are orthonormal

• rows are orthonormal

∗ • (Qx) Qy = x∗Q∗Qy = x∗y for any x, y. Note: kQxk = kxk, so Q is an isometry. Also, If u, v unitary, then uv is unitary since,

(uv)∗ (uv) = v∗u∗uv, (9.4.3a) = v∗v, (9.4.3b) = I, (9.4.3c) = (uv)(uv)∗ . (9.4.3d)

Example 9.12.Q in full QR factorization of any A. In Matlab, [q, r] = qr(a) (is this full QR?) and [q, r] = qr(a,0) (is this reduced QR?).

Now to compute the QR factorization, the Gramm–Schmidt algorithm is not numerically stable. Thus, small changes in the input matrix values can cause large changes in the result. The alternative is the modified Gramm–Schmidt which improves the stability properties. We

116 9.4. Lecture 31: November 6, 2013 Applied Matrix Theory

will not cover this here, but it is discussed in future courses. A better algorithm is to obtain the QR by premuliplying by orthogonal matrices until it is triangular, or

Qn ··· Q1 A = R, (9.4.4) | {z } Q∗ then A = QR. This is better because it does not use projections, which are not orthogonal. Rotations of orthogonal matrices as well as reflections are useful to introduce zeros. As an example,

Rotation cos(θ) − sin(θ) Example 9.13. Rotation in the xy plane about the origin. So the matrix P = . sin(θ) cos(θ)  cos(θ) sin(θ) Now, P−1 = P = = P|. This again shows that it is orthogonal, par- −θ − sin(θ) cos(θ) ticularly the columns are orthogonal. These are rotations in the plane. Example 9.14. 3D Rotation Rotation in three dimensions about the z-axis: This is very similar, cos(θ) − sin(θ) 0 P = sin(θ) cos(θ) 0 . (9.4.5) 0 0 1 this rotates in the xy plane.

We can further rotate in any plane ij for some vector in Rn; i j  1  i  cos(θ) − sin(θ)    P =  1 , (9.4.6)   j sin(θ) cos(θ)  1   x1  .   .     xi−1    cos(θ)xi − sin(θ)xj    xi+1   .  Px =  .  . (9.4.7)    x   j−1  sin(θ)x − cos(θ)x   i j  x   j+1   .   .  xn

117 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

This is called a Givens rotation. We can choose our θ such that (Qix)j = 0. So if we have

x x x x x x x x x x x x x x x x     Pθ = x x x x = x x x x. (9.4.8)     x x x x x x x x x x x x 0 x x x

So the QR factorization by Givens rotations,

Pθn ··· Pθ2 Pθ1 A = R. (9.4.9) | {z } Q∗

∗ Note, projections are not orthogonal. We can check this with PP = I. However P(u1) = 0 and this means we have a non-trivial null-space so projections is not invertible. Therefore, this is not invertible.

Reflection Example 9.15. Suppose we have the vectors u and x, where kuk = 1. We want to reflect x across the plane orthogonal to u. We will consider this operation Rx This operation is | also orthogonal. Now we will generalize a vector u⊥ = {v : v u = 0}. So the orthogonal projection onto u⊥; first we know hu, xi,

Px = x − hu, xi u, (9.4.10a) = (I − uu∗) x, (9.4.10b) Rx = (I − 2uu∗) x. (9.4.10c)

where P is the projection onto the subspace and R is the reflection across the subspace. Now R∗ = (I − 2uu∗) = R and R2 = I. This implies that R−1 = R∗ and R is orthogonal.

9.5 Homework Assignment 6: Due Monday, November 11, 2013

1 2 1. Let A = . Find kAk for p = 1, 2, ∞, F. 3 4 p P 2. Show that kAk = max |aij| (Hint: make sure you understand how the analogous ∞ i j formula for kAk1 was derived in class.) kAxk 3. (a) Given a vector norm kxk, prove that the formula kAk = sup kxk defines a matrix x6=0 norm. (This is called the induced matrix norm.) (b) Show that for any induced matrix norm, kAxk ≤ kAkkxk. (c) Prove that any induced matrix norm also satisfies kABk ≤ kAkkBk.

118 9.5. HW 6: Due November 11, 2013 Applied Matrix Theory

4. Consider the formula kAk = max |aij| i,j (a) Show that it defines a matrix norm. (b) Show that it is not induced by a vector norm.

5. Meyer, Exercise 5.2.6 Establish the following properties of the matrix 2-norm.

∗ (a) kAk2 = max |y Ax|, kxk2=1, kyk2=1 ∗ (b) kAk2 = kA k2, ∗ 2 (c) kA Ak = kAk2, A 0 (d) = max {kAk , kBk } (take A, B to be real), 0 B 2 2 2 ∗ ∗ ∗ (e) kU AVk2 = kAk2 when UU = I and V V = I.

−1 q 1 | 6. Show that kA k = where λmin is the smallest eigenvalue of A A. λmin 7. Show that hA, Bi = tr(A∗B) defines an inner product. 8. Meyer, Exercise 5.3.4 For a real inner-product space with k ? k2 = h?, ?i, derive the inequality

kxk2 + kyk2 hx, yi ≤ . Hint: Consider x − y. 2

9. Meyer, Exercise 5.3.5 For n × n matrices A and B, explain why each of the following inequalities is valid.

(a) |tr(B)|2 ≤ n[tr(B∗B)]. (b) tr(B2) ≤ tr(B|B) for real matrices. | tr(A|A)+tr(B|B) (c) tr(A B) ≤ 2 for real matrices. 10. Given 1 0 −1 1 1 2 1 1 A =  , and b =  . 1 1 −3 1 0 1 1 1

(a) Find an orthonormal basis for R(A), using the standard inner product. (b) Find the (reduced) QR decomposition of A. (c) For the matrix Q in (b), compute Q|Q and QQ|. (d) Find the least squares solution of Ax = b, using your results above. (e) Determine the Fourier expansion of b with respect to the basis you found in (a).

11. Explain why the (reduced) QR factorization of a matrix A of full rank is unique.

119 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

12. Meyer, Exercise 5.5.11 Let V be the inner-product space of real-valued continuous functions defined on the interval [−1, 1], where the inner product is defined by

1 hf, gi = f(x)g(x) dx , ˆ−1

and let S be the subspace of V that is spanned by the three linearly independent 2 polynomials q0 = 1, q1 = x, q2 = x .

(a) Use the Gram–Schmidt process to determine an orthonormal set of polynomials {p0, p1, p2} that spans S. These polynomials are the first three normalized Legendre polynomials.

(b) Verify that pn satisfies Legendres differential equation

(1 − x2)y00 − 2xy0 + n(n + 1)y = 0

for n = 0, 1, 2. This equation and its solutions are of considerable importance in applied mathematics.

9.6 Lecture 32: November 8, 2013

From last time:

Elementary orthogonal projectors Let u, where kuk = 1, then the projection of a vector x onto the sub plane orthogonal ∗ ∗ to u is P⊥x = x − hu, xi u. And P|| = uu and P = I − uu . Now this projector, P, is not orthogonal. This is because an orthogonal matrix has the form Q∗ = Q−1 or Q∗Q = QQ∗ = I. Now,

∗ ∗ ∗ ∗ P⊥ = I − (u ) u , (9.6.1a)

= P⊥. (9.6.1b)

This further gives

P∗P = P2, (9.6.2a) = P, (9.6.2b) 6= I. (9.6.2c)

This property shows that once we project, projection a second time does not change the result. Also, N(P) 6= 0, so the projectors are not invertible. Now the null space of P|| is equal to u⊥, or N(P||) = u⊥. Similarly N(P⊥) = span(u).

120 9.6. Lecture 32: November 8, 2013 Applied Matrix Theory

Elementary reflection Now Rx = x − hu, xi u, and in this case R is orthogonal. So R∗ = R and R∗R = RR∗ = I. Also, (I − 2uu∗)(I − 2uu∗) = I − 2uu∗ − 2uu∗ + 4u (u∗u) u∗, (9.6.3a) = I − 4uu∗ + 4uu∗, (9.6.3b) = I. (9.6.3c) Now use reflectors to compute A = QR. So say we have x x x x x x Ru =   . (9.6.4) x x x x x x

So Rx = (kuk, 0) = kukeˆi. Thus, u = x − kxkeˆi. Doing successive reflections,

RuN ··· Ru2 Ru1 A = R. (9.6.5) | {z } Q This gives us the Householder method.

Complimentary Subspaces of V Definition 9.16. If V = X + Y, where X , Y are subspaces such that X ∩ Y = {0}, which are called complimentary subspaces and V = X ⊕ Y is the direct sum of X , Y. Given the general picture, how do we define the angle between two subspaces? Note: If V = X ⊕ Y then any z ∈ V can be written uniquely as z = x + y, for x ∈ X and y ∈ Y. Further dim(V) = dim(X ) + dim(Y) and BV = BX ∪ BY .

Proof. If z = x1 + y1 = x2 + y2 then x1 − x2 = y1 − y2 ∈ X ∩ Y. So x1 − x2 = y1 − y2 = 0 and X ∩ Y = {0}.  n | Example 9.17. Say we have R = R(A) ⊕ N(A ) for Am×n.

Projectors Definition 9.18. We define general projectors: The projector P onto X along Y is the linear operator such that P(z) = P(x + y) = x. Note: If P projects onto X along Y then P2 = P because P2(x+y) = P(x) = P(x+0) = x = P(z). Now the null space, N(P) = y because P(z) = P(x + y) = x = 0. Further, R(P) = x. Also, R(P) ⊕ N(P) = Rn as we showed in Homework 5. Ultimately, we want to find the Jordan of our matrices. In general R(A)+ n N(A) 6= R . This is obvious if Am×n because they have different dimensions, so this only makes sense if An×n. But even if A is square, let y ∈ N(A) ∩ R(A) then Ay = 0 and y = Az for some z. Then A (Az) = A2z = 0, and we have a non-trivial intersection. So if A2 has a nontrivial null space, then N(A) and R(A) have nontrivial intersection.

121 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

Example 9.19. Obviously this cannot be an , so say we have

0 1 0 0 A = and A2 = 0 0 0 0

This is an example of a null-potent matrix. But this is only true for projectors.

Theorem 9.20.P is a projector if and only if P2 = P. These are also known as idempotent matrices.

9.7 Lecture 33: November 11, 2013

From last time:

Definition 9.21.P : V → V is a projector if for each X , Y such that V = X ⊕ Y and V(x + y) for any z = x + y ∈ V.

Note: R(V) = X and N(V) = Y.

Projectors

Theorem 9.22.P is a projector if and only if P2 = P. These are also known as idempotent matrices.

Proof. Given the vector space V and the operator P = P2,

R(P) ⊕ N(P) = V, (9.7.1a) | {z } | {z } X Y P(x + y) = Px + Py, (9.7.1b)

= P Px0 , (9.7.1c) |{z} x, some x0

= Px0, (9.7.1d) = x. (9.7.1e)

Going the other way,

z = Pz + (z − Pz), (9.7.2a) |{z} | {z } ∈R(P) ∈N(P) V = R(P) ⊕ N(P). (9.7.2b)



122 9.7. Lecture 33: November 11, 2013 Applied Matrix Theory

Representation of a projector

We discuss the representation of P. Given {m1,..., mr} as a basis for R(P) = X and {n1,..., nn−r} as a basis for N(P) = Y. Then Pmi = mi and Pni = 0. Let B = [M | N]. Then

PB = P[M | N], (9.7.3a) = [M | 0]. (9.7.3b)

[P]s = P, (9.7.4a) = [M | 0]B−1, (9.7.4b) I 0 = [M | N] r×r B−1, (9.7.4c) 0 0 I 0 = B r×r B−1, (9.7.4d) 0 0 −1 = [I]BS [P]B [I]BS . (9.7.4e)

Definition 9.23. For any subspace M ⊂ V, M⊥ = v ∈ V such that v⊥u = 0, u ∈ M .

Theorem 9.24. For any subspace M ⊂ V, V = M ⊕ M⊥

Proof. Given basis {b1,..., bm} of M, choose {bi} orthonormal complement by orthogonal set {bm+1,..., bn} such that {b1,..., bm, bm+1,..., bn} is a basis for V.  | {z } | {z } basis for M basis for M⊥

Example 9.25. Rn = R(A) ⊕ N(A|) where R(A) ⊥ N(A|). An orthogonal projector onto M is PM is I 0 P = [M | N] [M | N]−1, (9.7.5a) M 0 0 M∗M = 0, (9.7.5b) N∗N = 0, (9.7.5c) (9.7.5d)

Where  | |   | |  M = m1 ··· mm and N = nm+1 ··· nn . (9.7.6) | | | | n×m n×(n−m) Note: (M∗M)−1 M∗ I 0 MN = (9.7.7) (N∗N)−1 N∗ 0 I | {z } [M | N]−1

123 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

and (M∗M)−1 M∗ P = [M | 0] , (9.7.8a) M (N∗N)−1 N∗ = M (M∗M)−1 M∗. (9.7.8b) But if the basis were orthonormal, how does this change the formula? Given any basis {m1,..., mm} for subspace M, orthogonal projector.

∗ −1 ∗ PM = M (M M) M . (9.7.9)

∗ If {m1,..., mm} are orthogonal then M M = I and

∗ PM = MM . (9.7.10) Example 9.26. Elementary orthogonal projectors,

∗ P|| = uu . (9.7.11) and ∗ P⊥ = I − uu (9.7.12) Theorem 9.27. 2 2 kx − PMxk = min kx − yk . (9.7.13) 2 y∈M 2 (we will prove this as an exercise)

| −1 | Note: A (A A) A is the projector onto the range of A, or PR(A) where we assume that A has full rank. The normal equations to solve Ax = b is A|Ax = A|b. (9.7.14) and x = A|A−1 A| b. (9.7.15) | {z } pseudoinverse So,

Ax = A A|A−1 A|b, (9.7.16a)

= PR(A)b. (9.7.16b)

9.8 Lecture 34: November 13, 2013

Projectors We discussed a projector P onto X along Y and also that the projector is idempotent, P2 = P. Further, R(P) = X and N(P) = Y. I 0 [P] = [M | N] r×r [M | N]−1. (9.8.1) S 0 0

124 9.8. Lecture 34: November 13, 2013 Applied Matrix Theory

The orthogonal projector onto M = R(M), where M = [m1 ··· mm] is the basis of M,

P = M (M∗M)−1 M∗ (9.8.2)

The normal equations for Ax = b, with A being a full rank matrix, are

Ax = PR(A)b. (9.8.3)

Projector P is orthogonal, then P∗ = P.

Proof. P is an orthogonal projector,

P = M (M∗M)−1 M∗, (9.8.4a) P∗ = M (M∗M)−1 M∗, (9.8.4b) = P. (9.8.4c) further suppose that P = P2 and P = P∗. Now we want to show that N(P) ⊥ R(P), where it is normal in the standard inner product. Let x ∈ R(P) and y ∈ N(P). Then consider the inner product,

y∗x = y∗Px, (9.8.5a) = ( P∗ y)∗x, (9.8.5b) |{z} P = (Py)∗ x, (9.8.5c) | {z } 0∗ = 0∗x, (9.8.5d) = 0. (9.8.5e)

∗ if {mi} are orthogonal, PM = MM . 

Example 9.28.

∗ P|| = uu , (9.8.6a) ∗ P⊥ = I − uu . (9.8.6b)

V = X ⊕ Y.

Decompositions of Rn

| n | n ⊥ | Given An×n, we know R(A) ⊕ N(A ) = R and R(A ) ⊕ N(A) = R , but R(A) = N(A ). Let B = { u1,..., ur , ur+1,..., un} orthonormal. Further B = { v1,..., vr , vr+1,..., vn} | {z } | {z } | {z } | {z } basis for R(A) basis for N(A|) basis for R(A|) basis for N(A)

125 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation also orthonormal. So,

| |  U AV = UR(A) UN(A|) A VR(A|) VN(A) , (9.8.7a)  |  UR(A)  = | AVR(A|) AVN(A) , (9.8.7b) UN(A|)  |  UR(A)AVR(A|) 0 = | , (9.8.7c) UN(A|)AVN(A) 0 C 0 = r×r . (9.8.7d) 0 0

| | UN(A|)A = A UN(A|), (9.8.8a) = 0. (9.8.8b)

Range Nullspace decomposition of An×n

Theorem 9.29. Rn = R(Ak) ⊕ N(Ak) for some k. This is not necessarily an orthogonal decomposition. The smallest such k is called the index of A. Proof. First, note that R(Ak+1) ⊂ R(Ak) for any k. This is because if y ∈ R(Ak+1), then y = Ak+1z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2) ⊂ R(A3) ⊂ · · · ⊂ k k+1 k+2 R(A ) = R(A ) = R(A ) = ··· contains equality for some k.  to be continued. . .

9.9 Homework Assignment 7: Due Friday, November 22, 2013

You may use Matlab to compute matrix products, or to reduce a matrix to Row Echelon Form.

1. (a) Let A ∈ Rm×n. Prove R(A) and N(A|) are orthogonal complements of Rm. 1 2 0 (b) Verify this fact for A = 2 4 1 . 1 2 0 2. Prove: If X , Y are subspaces of V such that V = X ⊕ Y, then for any x ∈ V there exists a unique x ∈ X and y ∈ Y such that z = x + y. 3. Prove: If X , Y are subspaces of V such that V = X +Y and dim(X )+dim(Y) = dim(V) then X ∩ Y = {0}. 4. Textbook 5.11.3: 1 2   2 4 Find a basis for the orthogonal complement of M = span  ,   . 0 1  3 6 

126 9.9. HW 7: Due November 22, 2013 Applied Matrix Theory

5. Let P be a projector. Let P0 = I − P.

(a) Show that P0 = I − P is also a projector. It is called the complementary projector of P. (b) Any projector projects a point z ∈ V onto X along Y, where X ⊕ Y = V, by P(z) = P(x + y) = x. What are the X and Y for P and I − P, respectively?

6. Textbook 5.9.1: Let X and Y be subspaces of R3 whose respective bases are        1 1   1  BX = 1, 2 and BY = 2  1 2   3 

(a) Explain why X and Y are complementary subspaces of R3. (b) Determine the projector P onto X along Y as well as the complementary projector Q onto Y along X .  2 (c) Determine the projection of v = −1 onto Y along X . 1 (d) Verify that P and Q are both idempotent. (e) Verify that R(P) = X = N(Q) and N(P) = Y = R(Q).

7. (a) Find the orthogonal projection of b = (4, 8)| onto M = span {u}, where u = (3, 1)|. (b) Find the orthogonal projection of b onto u⊥, for b, u given in (a). (c) Find the orthogonal projection of b = (5, 2, 5, 3)| onto

M = span (3/5, 0, 4/5, 0)|, (0, 0, 0, 1)|, (4/5, 0, 3/5, 0)| .

(Note: the given columns are orthonormal.) (d) Find the orthogonal projection of b = (1, 1, 1)| onto the range of

1 0 A = 2 1 1 0

8. (a) Show that kPk2 ≥ 1 for every projector P 6= 0. When is kPk2 = 1?

(b) Show that kI − Pk2 = kPk2 for all projectors P 6= 0, I. 9. (a) Show that the eigenvalues of a unitary matrix satisfy |λ| = 1. Show by a counter- example that reverse not true. (b) Show that the eigenvalues of a projector are either 0 or 1. Show by a counter- example that the reverse not true. 10. Let u be a unit vector. The elementary reflector about u⊥ is defined to be R = I−2uu∗.

127 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

(a) Prove that all elementary reflectors are involutory (R2 = I), hermitian, and uni- tary.

(b) Prove that if Rx = µeˆi, then µ = ±kxk2, and that R:i = Reˆi = ±x. 1 | (c) Find the elementary reflector that maps x = 3 (1, −2, −2) onto the x-axis. (d) Verify by direct computation that your reflector in (c) is symmetric, orthogonal, involutory. (e) Extend the vector x in (c), to an orthonormal basis for R3. (Hint: what do you know about the columns of R from parts (a,b) above?)

11. Textbook 5.6.17: Perform the following sequence of rotations in R3 beginning with  1 v0 =  1 −1

1. Rotate v0 counterclockwise 45° around the x-axis to produce v1. 2. Rotate v1 clockwise 90° around the y-axis to produce v2. 3. Rotate v2 counterclockwise 30° around the z-axis to produce v3.

Determine the coordinates of v3 as well as an orthogonal matrix Q such that Qv0 = v3. −2 0 −4 12. (a) Find the index of A =  4 2 4. Find its core-nilpotent decomposition. 3 2 2 (b) A matrix is said to be nilpotent if Ak = 0 for some k. Show that the index of a nilpotent matrix is the smallest k for which Ak = 0. Find its core-nilpotent decomposition. (c) Find the index of a projector that is not the identity. Find its core-nilpotent decomposition. (d) What is the index of the identity?

9.10 Lecture 35: November 15, 2013

Range Nullspace decomposition of An×n

n k k Theorem 9.30. For any An×n and some k; R = R(A ) ⊕ N(A ). The smallest such k is called the index of A.

Example 9.31. Nilpotent matrices have some k such that Nk = 0, R(Nk) = {0}, and N(Nk) = Rn Proof. First, note that R(Ak+1) ⊆ R(Ak) for any k. This is because if y ∈ R(Ak+1), then y = Ak+1z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2) ⊂ R(A3) ⊂ · · · ⊂ R(Ak) = R(Ak+1) = R(Ak+2) = ··· contains equality for some k. The dimensions decrease

128 9.10. Lecture 35: November 15, 2013 Applied Matrix Theory

if proper. Third, once equality achieved, it is maintained through the rest of the chain. The proof:

R(Ak+2) = R(Ak+1A), (9.10.1a) = AR(Ak+1), (9.10.1b) = AR(Ak), (9.10.1c) = R(Ak). (9.10.1d)

Fourth, N(A0) ⊂ N(A) ⊂ N(A2) ⊂ · · · ⊂ N(Ak) = N(Ak+1) = N(Ak+2) = ···. Why does the nullspace change at the same spot as the columnspace? Because dim(N(Ak)) = n−dim(R(Ak)), so once the dimensions are constant in the columnspace, then the dimensions will be constant for the nullspace. Fifth, R(Ak) ∩ N(Ak) = {0}: Let y ∈ R(Ak) and y ∈ N(Ak), then y = Akx for some x, and Aky = 0. So A2kx = 0 and x ∈ N(A2k) = N(Ak) k so A x = 0. Sixth, R(Ak) + N(Ak) = Rn since the dimensions add up and there is no |{z} y intersection of the two spaces (except for {0}).  Now, how can we factor the matrix?

Corresponding factorization of A

k  k Let {x1,..., xr} be a basis for R(A ) and y ,..., y be a basis for N(A ). Then S =   1 n−r  x1,..., xr, y1,..., yn−r , and we note that X = span {x1,..., xr} and Y = span y1,..., yn−r which are both invariant subspaces. So

C 0  S−1AS = r×r . (9.10.2) 0 N(n−r),(n−r)

Note S−1AkS = (S−1AS)k because the inverse and normal S terms cancel out in the expo- nentiation. Thus,

C˜ 0  S−1AkS = , (9.10.3a) 0 Nk = S−1Ak XY , (9.10.3b) = S−1 AkXAkY , (9.10.3c) = S−1 AkX 0 , (9.10.3d) = S−1AkX 0 . (9.10.3e)

Thus Nk = 0 and N is nilpotent and C is invertible. So we have a core-nilpotent factorization of A. So we have a similarity factorization which always exists. We recall the decomposition for any A ∈ Rn×n = R(A) ⊕ N(A|) = R(A|) ⊕ N(A), corresponding factorization C 0 U|AV = . (9.10.4) 0 0

129 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation

130 UNIT 10

Singular Value Decomposition

10.1 Lecture 35 (cont.)

Singular Value Decomposition

The singular value decomposition is a way to find the orthogonal matrices Un and Vn may be found such that we may diagonalize A. Or   σ1 0 ··· 0 0 ··· 0  .. . . .   0 σ2 . . . .   . . .   . .. .. 0 0 ··· 0  | | |   Um ··· U2U1AV1V2 ··· Vm =  0 ··· 0 σ 0 ··· 0  (10.1.1)  r   0 ··· 0 0 0 ··· 0     ......   ......  0 ··· 0 0 0 ··· 0

Theorem 10.1. For any Am×n there exists orthogonal U and V such that

| Am×n = UDV , (10.1.2a)   σ1 0 ··· 0 0 ··· 0  .. . . .  0 σ2 . . . .  . . .   . .. .. 0 0 ··· 0    | = [U]  0 ··· 0 σ 0 ··· 0 V (10.1.2b) m×m  r  n×n  0 ··· 0 0 0 ··· 0    ......   ......  0 ··· 0 0 0 ··· 0

where σi are real and greater than 0. Further σ1 ≥ σ2 ≥ · · · ≥ σr, where r = rank(A).

Definition 10.2. σi are the singular values of A. Note:

131 Nitsche and Benner Unit 10. Singular Value Decomposition

1. σi are uniquely determined, but U, V are not unique

2. rank(A) = rank(D)

3. kAk2 = kDk2

A 0 = max (kAk , kBk ) 0 B 2 2 2

4. If A is invertible,   σ1 0 ··· 0 . .  .. .   0 σ2 .  | A = U  . . .  V , (10.1.3a)  . .. .. 0  0 ··· 0 σr  1 0 ··· 0  σ1  1 .. .  −1  0 . .  A = V  σ2  U|, (10.1.3b)  . .. ..   . . . 0  0 ··· 0 1 σn  1 0 ··· 0  σn  1 .. .   0 . .  = V˜  σn−1  U˜ |. (10.1.3c)  . .. ..   . . . 0  0 ··· 0 1 σ1

Now K(A) = kAk·kA−1k = σ1 which means that we can have issues with singularities. σn

Example 10.3. Prove kI − Pk2 = kPk2. What is the norm of P and of I − P? From illustration we can use tangents to the unit ball. Then kPωk = k(I − P) ωk needs to be shown.

10.2 Lecture 36: November 18, 2013

We will do review for exam on Friday.

Singular Value Decomposition SVD:

Theorem 10.4. For any Am×n there exists orthogonal U, V such that

| Am×n = Um×mDm×nVn×n, (10.2.1)

132 10.2. Lecture 36: November 18, 2013 Applied Matrix Theory

where   σ1 0 ··· 0 0 ··· 0  .. . . .  0 σ2 . . . .  . . .   . .. .. 0 0 ··· 0   D =  0 ··· 0 σ 0 ··· 0 (10.2.2)  r   0 ··· 0 0 0 ··· 0    ......   ......  0 ··· 0 0 0 ··· 0 m×n

and σi > 0, σ1 ≥ σ2 ≥ · · · ≥ σr > 0.

Notes:

1. kAk = σ , kA−1k = 1 , where A would have to be invertible. 2 1 2 σn The condition number is κ(A) = σ1 . σn

2. r = rank(A).

Qn 3. |det(A)| = i=1 σi.

4. A−1 = VD−1U|.

Existence of the Singular Value Decomposition Proof. We know that there exists U and V such that,

C 0 UA|V = , (10.2.3) 0 0

where Cr×r is invertible. Let x be such that kxk = 1, and

kCk2 = max kCyk2, (10.2.4a) kyk2=1

= kCxk2, (10.2.4b) = σ1, (10.2.4c)

= kAk2. (10.2.4d)

Let y = Cx and further the two orthogonal matrices, [x | X] and [y | Y]. Now, kCxk2

C 0  y|  [y | Y]| [x | X] = Cx CX , (10.2.5a) 0 0 Y|  y|Cx y|CX  = (10.2.5b) Y|Cx Y|CX

133 Nitsche and Benner Unit 10. Singular Value Decomposition

Further,

x|C|Cx y|Cx = , (10.2.6a) kCxk2 kCxk2 = 2 , (10.2.6b) kCxk2

= kCxk2, (10.2.6c) = σ1. (10.2.6d)

Similarly, YCx = 0,

x|C|CX y|CX = , (10.2.7a) kCxk2 x|C|Cxx|X = , (10.2.7b) kCxk2 x|C|Cx = x|X, (10.2.7c) kCxk2 | = σ1 x X , (10.2.7d) |{z} orthogonal = 0. (10.2.7e)

So we have reduced to, σ 0 1 . 0| C˜ We may then repeat this by maximizing the two-norm to get the full singular value decom- position.  Notes:   σ1 0 ··· 0 0 ··· 0  .. . . .  0 σ2 . . . .  . . .   . .. .. 0 0 ··· 0   | Am×n = [U]m×m  0 ··· 0 σ 0 ··· 0 [V ]n×n, (10.2.8a)  r   0 ··· 0 0 0 ··· 0    ......   ......  0 ··· 0 0 0 ··· 0 m×n   σ1 0 ··· 0    |  | | . . − v1 −  0 σ .. .  = u ··· u  2   .  , (10.2.8b)  1 r  . .. ..   .  | |  . . . 0  | m×r − v − 0 ··· 0 σ r r×n r r×r = Uˆ Dˆ Vˆ . (10.2.8c)

134 10.2. Lecture 36: November 18, 2013 Applied Matrix Theory

from trimming out the zeros. Here σ1, . . . , σr are unique, and u1,..., ur and v1,..., vr are unique up to . From the existence of A = UDV|, what can we deduce? We know that U|U = UU| = I and V|V = VV| = I. So,

[AV]:j = [UD]:j , (10.2.9a)  0  .  .   .     0    = U σj , (10.2.9b)    0   .   .  0

= σjuj. (10.2.9c) where (AB):j = AB:j. Now,  σ u , 1 ≤ j ≤ r Av = j j , (10.2.10a) j 0, j > r A| = VD|U|, (10.2.10b)  σ v , 1 ≤ j ≤ r A|U = VD, A|u = j j . (10.2.10c) j 0, j > r So, the four fundamental subspaces are

• R(A) = span {u1,..., ur}

• N(A) = span {vr+1,..., vn} | • R(A ) = span {v1,..., vr} | • N(A ) = span {ur+1,..., um}

|  | | | A A n×n = VD U UDV , (10.2.11a) = VD|DV|, (10.2.11b)  2  σ1 0 ··· 0 0 ··· 0 . . .  2 ......   0 σ2 . . .  . . .   . .. .. 0 0 ··· 0   = V  0 ··· 0 σ2 0 ··· 0 V|, (10.2.11c)  r   0 ··· 0 0 0 ··· 0    ......   ......  0 ··· 0 0 0 ··· 0 n×n |  |  A AV :j = VD D :j , (10.2.11d)  σ1v , j ≤ r A|Av = j j . (10.2.11e) j 0, j > r

135 Nitsche and Benner Unit 10. Singular Value Decomposition

p | | Thus, σj = λs (A A) , for j = 1, . . . , r. Similarly, vj are the eigenvectors of A A for j = 1, . . . , r and vj are orthogonal because eigenvectors of symmetric matrices are orthogonal. To construct the SVD, we will | | 1. find λj, which are the eigenvalues of A A and the eigenvectors of A A, vj.

2. Find u1,..., ur for σjuj = Avj

3. Find complementary orthogonal set ur+1,..., um and vr+1,..., vn.

10.3 Lecture 37: November 20, 2013

Review and correction from last time From last time: C 0 U|AV = (10.3.1) 0 0 Cx Then we said there exists an x such that kCxk = kCk = σ1. Then we let y = Consider, 2 σ1 [x | X] and [y | Y]. In our system (we must correct this from last lecture), since we know that the x is the eigenvector corresponding to the λ and C|Cx = λx. Then x|C|C = x|λ = λx|

x|C|CX y|CX = , (10.3.2a) σ1 λx|X = , (10.3.2b) σ1 = 0. (10.3.2c) SVD will not be on the exam, but will be on the final.

Singular Value Decomposition We know, A = UDV|, (10.3.3a) = Uˆ Dˆ Vˆ | (10.3.3b) Similarly, VA = UD. (10.3.4) This means that  σ u , j ≤ r Av = j j (10.3.5) j 0, j > r Then,  σ v , j ≤ r A|u = j j (10.3.6) j 0, j > r

Thus, vj are called the right singular vectors, the uj are the left singular vectors, and p | σj = λ(A A) are the singular values. Also we may define the four subspaces,

136 10.3. Lecture 37: November 20, 2013 Applied Matrix Theory

• R(A) = span {u1,..., ur}

• N(A) = span {vr+1,..., vn} | • R(A ) = span {v1,..., vr} | • N(A ) = span {ur+1,..., um} So if we have the SVD, it is easy to describe these subspaces. So we will construct the SVD using these facts. Now,

A|A = VD|U|UDV|, (10.3.7a) = VD|DV|, (10.3.7b) A|AV = VD|D. (10.3.7c)

Example 10.5. Given 1 1 A = . (10.3.8) 2 2 Then r = 1 and 1 2 1 1 A|A = , (10.3.9a) 1 2 2 2 5 5 = (10.3.9b) 5 5

A|Av = λv, (10.3.10a) detA|A − λI = 0, (10.3.10b)

5 − λ 5 2 = 25 − 10λ + λ − 25, (10.3.10c) 5 5 − λ = λ2 − 10λ, (10.3.10d) = λ (λ − 10) . (10.3.10e)

| So to find v1 for (A A − λI)v = 0. 5 − 10 5  v = 0, (10.3.11a) 5 5 − 10 −5 5  v  0 1 = , (10.3.11b) 5 −5 v2 0

−5v1 + 5v2 = 0, v2 = v1, (10.3.11c) 1 1 v1 = √ (10.3.11d) 2 1 So 1 2 1 1 1 2 Av1 = √ = √ (10.3.12a) 1 2 2 1 2 4

137 Nitsche and Benner Unit 10. Singular Value Decomposition

Thus, 1 2 u1 = √ , (10.3.13a) 20 4 1 1 = √ (10.3.13b) 5 2

1 1 √ 1 A = √ 10  √ 1 1 , (10.3.14a) 5 2 2 = Uˆ Dˆ Vˆ |, (10.3.14b) √ 1 1 2   10 0 1  1 1 = √ √ , (10.3.14c) 5 2 −1 0 0 2 −1 1 = UDV| (10.3.14d) This is great to do by hand, but is not a very numerically stable way to find the SVD.

Geometric interpretation

n The image of the unit sphere S2 = {x ∈ R , kxk2 = 1} y = Ax, (10.3.15a) = UDV|x, (10.3.15b) U|y = DV|x. (10.3.15c) Let y0 = U|x and x0 = V|x. So y0 = Dx0, (10.3.16a) 0 0 yj = σjxj. (10.3.16b)

2 0 2 | 2 Now kxk2 = 1 and kx k2 = kV xk2 = 1. Thus, 0 2 0 2 0 2 (x1) + (x2) + ··· + (xn) = 1, (10.3.17a)  y0 2  y0 2  y0 2 1 + 2 + ··· + n = 1, (10.3.17b) σ1 σ2 σn

which is a hyperellipse! Viewing the transformation of Axj = σjuj. This shows that the σj give the major and minor axes of the multi-dimensional ellipsoid. There is a nice fact about the SVD. For low rank approximations (the second step may be rationalized easily from the matrix form) A = UDV|, (10.3.18a) r X | = σjujvj . (10.3.18b) j=1

This is a way to write any matrix as a sum of rank 1 matrices. Now the σj decrease, so we may Pk | truncate the series when σj gets close to zero. Let Ak = j=1 σjujvj with rank(Ak) = k.

138 10.4. Lecture 38: November 22, 2013 Applied Matrix Theory

Theorem 10.6. kA − Akk2 = σk+1 and is the best approximation, or

kA − Akk2 = min kA − Bk2. (10.3.19) rank(B)=k From     σ1 σ1  ..   ..   .   .       σk   σk   σ   0   k+1  |   | Ak = U . V − U . V ,  ..   ..       σ   0   r     .   .   ..   ..  0 0 (10.3.20a) 0   ..   .     0     σk+1  = U . V| (10.3.20b)  ..     σ   r   .   ..  0 We will explore the proof and implications of this theorem later.

10.4 Lecture 38: November 22, 2013

Review for Exam 2 From homework, we need to be able to go through proofs like this

• kAk∞ ⇐⇒ kAk1 • Matrix norm • QR unique −1 √ 1 • kA k2 = λmin(A|A)

p | • kAk2 = λmax(A A)

Norms To show that something is a norm (whether matrices or vectors), we must show the following properties,

139 Nitsche and Benner Unit 10. Singular Value Decomposition

1. kxk ≥ 0 for any x and kxk = 0 implies x = 0 2. kαxk = |α|kxk 3. kx + yk = kxk + kyk

Several matrix norms, the induced and Frobenius, have the fourth property,

kABk ≤ kAkkBk (10.4.1)

More major topics The exam covers chapters 4 and 5 (minus the SVD). These are things to know:

• Subspace (closed under addition and )

• Linear transformations (Definition: addition and scalar multiplication)

• Coordinates and change of bases

[x]S = x, (10.4.2a) X = xieˆi = Ix, (10.4.2b) i X = ciui = Uc. (10.4.2c) i

where c = [x]B and this is clearly a problem of inverting a matrix. The formula is

c = [x]B , (10.4.3a)  = [eˆ1]B [eˆ2]B ··· [eˆn]B [x]S . (10.4.3b) | {z } U−1 So we really care about what the representation is of some linear operator for some basis.  [T]B = [T(u1)]B [T(u2)]B ··· [T(un)]B , (10.4.4a)

[T(x)]B = [T]B [x]B . (10.4.4b)

• Change coordinates

0 [T]B ∼ [T]B , (10.4.5a) T = ST0S−1. (10.4.5b)

• Least squares: Ax = b. The normal equations are

A|Axˆ = A|b (10.4.6)

140 10.4. Lecture 38: November 22, 2013 Applied Matrix Theory

This connects with projections because,

Axˆ = A A|A−1 A| b = Pb. (10.4.7) | {z } P⊥R(A) The solution is unique of the matrix is full rank because then A|A is invertible.

• Projectors Defined by P = P2, similarly we have the properties of the complementary projector (I − P). These are orthogonal to each other, and P∗ = P (Unitary matrix is an orthonormal matrix (when real) Q∗Q = I and Q∗ = Q−1. The projector always projects onto its range. The proof of P and (I − P).

• Gram–Schmidt needed to orthogonalize a set of matrices. or P = uu∗ = UU∗ = I−uu∗ • QR othogonalization (using Gram–Schmidt)

Show A = QR is unique, rjj > 0, Q is orthonormal, R is upper triangular. Existence and uniqueness? From the Gramm–Schmidt construction process we know we can get it because we can always construct it. GS was

a1 = r11q1, (10.4.8a)

a2 = r12q1 + r22q2, (10.4.8b) ··· (10.4.8c)

an = r1nq1 + r2nq2 + ··· + r2,nqn. (10.4.8d) Uniqueness: this also shows uniqueness directly because you have these equations and

may invert them. (Invertibility) a1 = r11q1 implies ka1k = kr11q1k = |r11|kq1k and we 1 may find the r11 so then q = a1. Then induction may prove this is true for all the 1 r11 other values of n. First we would show true for n = 1 (all qk are uniquely determined), then show if true for n = k then it’s also still true for n = k + 1. This is done with showing r1,k+1, . . . , rk+1,k+1, qk+1 are uniquely determined.

ak+1 = r1,k+1q1 + ··· + rk,k+1qk + rk+1,k+1qk+1 (10.4.9a)

This is a Fourier series and we may take ak+1, qj = rj,k+1 qj, qj = rj,k+1 for j < k+1 therefore all we have left is to find the vector

rk+1,k+1qk+1 = ak+1 − r1,k+1q1 − · · · − rk,k+1qk (10.4.9b)

and we can do the same argument again to finish with rk+1,k+1 and qk+1

rk+1,k+1qk+1 = kbk, (10.4.10a)

|rk+1,k+1| qk+1 = kbk, (10.4.10b) | {z } 1

|rk+1,k+1| = kbk, (10.4.10c)

rk+1,k+1 = kbk. (10.4.10d)

141 Nitsche and Benner Unit 10. Singular Value Decomposition

For positive rj,j. So we have several decompositions now to work with.

• Invariant subspaces will give a block diagonal form of the matrix. will have class on Wednesday.

10.5 Homework Assignment 8: Due Tuesday, Decem- ber 10, 2013

You may use Matlab to compute matrix products, or to reduce a matrix to Row Echelon Form.

1. Determine the SVDs of the following matrices (by hand calculation).

3 0 (a) 0 −2 0 2 (b) 0 0 0 0 1 1 (c) 1 1

1 2 2. Let 0 2

(a) Use Matlab to find the SVD of A. State U, Σ, V (4-decimal digit format is fine).

(b) In one plot draw the unit circle C and indicate the vectors v1, v2, and in another plot draw the ellipse AC (i.e. the image of the circle under the transformation x → Ax) and indicate the vectors Av1 = σ1u1, Av2 = σ2u2. Use the axis(’square’) command in Matlab to ensure that the horizontal and vertical axes have the same scale.

(c) Find A1, the best rank-1 approximation to A in the 2-norm. Find kA − A1k2.

3. Let A ∈ Rm×n, with rank r. Use the singular value decomposition of A to prove the following.

(a)N( A) and R(A|) are orthogonal complementary subspaces of Rn. (b) Properties in 5.2.6 (b, c, d, e): Establish the following properties of the matrix 2-norm. (a)* ∗ (b) kAk2 = kA k2, ∗ 2 (c) kA Ak2 = kAk2,

142 10.5. HW 8: Due December 10, 2013 Applied Matrix Theory

A 0  (d) = max {kAk , kBk } (take A, B to be real, 0 A 2 2 2 ∗ ∗ ∗ (e) kU AVk2 = kAk2 when UU = I and V V = I. p 2 2 2 (c) kAkF = σ1 + σ2 + ··· + σr .

n×n 4. Show that if A ∈ R is symmetric then σj = |λj|. 5. Compute the determinants of the matrices given in 6.1.3 (a), 6.1.3 (c), 6.2.1 (b).

1 2 3 (a) A = 2 4 1 1 4 4  1 2 −3 4  4 8 12 −8 (b) A =    2 3 2 1 −3 −1 1 −4

0 0 −2 3

1 0 1 2 (c) −1 1 2 1

0 2 −3 0

6. (a) Show that if A is invertible, then det(A−1) = 1/ det(A). (b) Show that for any invertible matrix S, det(SAS−1) = det(A). (c) If A is n × n, show that det(αA) = αn det(A). (d) If A is skew-symmetric, show that A is singular whenever n is odd. (e) Show by example that in general, det(A + B) 6= det(A) + det(B).

7. (a) Let An×n = diag {d1, d2, . . . , dn}. What are the eigenvalues and eigenvectors of A? (b) Let A be a nonsingular matrix and let λ be an eigenvalue of A. Show that 1/λ is an eigenvalue of A−1. (c) Let A be an n × n matrix and let B = A − αI for some scalar α. How do the eigenvalues of A and B compare? Explain. (d) Show that all eigenvalues of a nilpotent matrix are 0. 8. For each of the two matrices,

 3 2 1 −4 −3 −3 A = A1 =  0 2 0, A = A2 =  0 −1 0 −2 −3 0 6 6 5

determine if they are diagonalizable. If they are, find

(a) a nonsingular P such that P−1AP is diagonal. (b) A100 (c)e A.

143 Nitsche and Benner Unit 10. Singular Value Decomposition

9. Use diagonalization to solve the system dx dy = x + y, = −x + y, x(0) = 100, y(0) = 100. dt dt 10. 7.4.1

Suppose that An×n is diagonalizable, and let P = [x1|x2| · · · |xn] be a matrix whose columns are a complete set of linearly independent eigenvectors corresponding to eigen- 0 values λi. Show that the solution to u = Au, u(0) = c, can be written as

λ1t λ1t u(t) = ξ1e x1 + ξ1e x1

in which the coefficients ξi satisfy the algebraic system Pξ = c. 11. 7.5.3 Show that A ∈ Rn×n is normal and has real eigenvalues if and only if A is symmetric. 12. 7.5.4 Prove that the eigenvalues of a real skew-symmetric or skew-hermitian matrix must be pure imaginary numbers (i.e., multiples of i). 13. 7.6.1 Which of the following matrices are positive definite?  1 −1 −1 20 6 8 2 0 2 A = −1 5 1, B =  6 3 0, C = 0 6 2 . −1 1 5 8 0 8 2 2 4

14. 7.6.4 By diagonalizing the quadratic form 13x2 + 10xy + 13y2, show that the rotated graph of 13x2 + 10xy + 13y2 = 72 is an ellipse in standard form as shown in Figure 7.2.1 on p. 505.

10.6 Lecture 39: November 27, 2013

We will have one more homework before the end. We will have a homework on SVD and eigenvalues with the diagonalization, and we will be covering the Jordan Canonical Form but may not be putting it on the homework. It will be due next Friday so we have time for the solutions before the final. The final is cumulative and will be held on Wednesday.

Singular Value Decomposition We know that A = UΣV| for any matrix A. Here Σ is a diagonal matrix. We may rearrange, AV = UΣ, (10.6.1a)  σ u , j ≤ r Av = j j (10.6.1b) j 0 j > r

144 10.6. Lecture 39: November 27, 2013 Applied Matrix Theory

Pr | Pk | The SVD A = j=1 σjujvj for a matrix of rank r. We may define, Ak = j=1 σjujvj and have an aproximation of rank k.

Theorem 10.7.

kA − Akk2 = σk+1, (10.6.2a)

= min kA − Bk2 (10.6.2b) rank(B)=k

In words, Ak is a best approximation of rank k of A in the 2-norm. Proof. The first part is easily shown by the matrix form of the eigenvalues which are in the diagonal matrix of the SVD. For the second part, we assume there is a matrix B which has

rank k and follows the condition kA − Bk2 < σk+1. Then there exists a subspace W of dim(W) = n − k such that Bw = 0 for any w ∈ W. For such a w,

kAwk2 = k(A − B) wk2, (10.6.3a)

≤ k(A − B)k2kwk2, (10.6.3b)

< σk+1kwk2. (10.6.3c)

But subspace V of dim(V) = k + 1 such that, kAwk2 ≥ σk+1 for all w ∈ V, namely V = span {v1,..., vk+1}. This is impossible though because the subspaces do not have agreeing dimensions, or since dim(V) + dim(W) > n there exists w 6= 0 ∈ (V ∩ W). For

this w must have kAwk2 < σk+1kwk2 and kAwk2 ≥ σk+1kwk2 which is an impossible contradiction. This proof is a little more elementary than the proof in the book.  Thus, we can approximate a matrix by some lower-rank matrices. This is good because then we have fewer non-zero entries in our system and reduce our co.

SVD in Matlab Example handed out in class: In Matlab if you say x = load(clown.mat) then type whos and you will see a matrix X. This may be displayed with image(X). Then we do [U,S,V] = svd(X). The first figure (Figure 10.1) plots the diagonal entries of S. So we see we can truncate the small values. As we increase the approximations for k = 3, 10, 30 we ˜ ˜ ˜ | see a significantly improving image in Figure 10.2. So Ak = UΣV and this is done with Ak = U(:,1:k) * S(1:k,1:k) * V(:,1:k)’. Now we see that for k = 30 we have a good approximation which is significantly less expensive than the original matrix. Further in Table 10.1 we observe that the relative error decreases significantly.

Listing 10.1. svdimag.m 1 % application of the SVD to image compression 2 % from ”Applied Numerical Linear Algebra”, by J. Demmel, page 114 (SIAM) 3 load clown . mat 4 %X is a matrix of picels of dimension 200 by 320 5 [U, S ,V]=svd (X); 6 %% 7 figure ( 1 )

145 Nitsche and Benner Unit 10. Singular Value Decomposition

Table 10.1. Relative error of SVD approximation matrix Ak

relative error compression ratio k σk+1/σk 520k/(200 · 320) 3 0.155 0.024 10 0.077 0.081 30 0.027 0.244

8 plot ( diag (S)); 9 set ( gca , ’FontSize’ ,15) 10 xlabel ( ’ k ’ ) 11 ylabel (’ \ sigma k ’ ) 12 t i t l e ( ’ S i n g u l a r values o f X’) 13 %% 14 figure ( 2 ) 15 i f o n t =12 16 colormap( ’ gray ’ ) 17 subplot (’position’ ,[.07,.54,.40,.40]) 18 k=3; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V(:,1:k)’); t i t l e ( ’ k=3 ’ ) 19 set ( gca , ’FontSize’,ifont) 20 set ( gca , ’XTickLabel’,’’) 21 % 22 subplot (’position’ ,[.5,.54,.40,.40]) 23 k=10; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V(:,1:k)’); t i t l e ( ’ k=10 ’ ) 24 set ( gca , ’FontSize’,ifont) 25 set ( gca , ’YTickLabel’,’’) 26 set ( gca , ’XTickLabel’,’’) 27 % 28 subplot (’position’ ,[.07,.06,.40,.40]) 29 k=30; image(U( : , 1 : k )∗S ( 1 : k , 1 : k )∗V(:,1:k)’); t i t l e (’k=30’,’FontSize’,ifont) 30 set ( gca , ’FontSize’,ifont) 31 % 32 subplot (’position’ ,[.5,.06,.40,.40]) 33 image(X); t i t l e (’original ’) 34 set ( gca , ’FontSize’,ifont) 35 set ( gca , ’YTickLabel’,’’)

146 10.6. Lecture 39: November 27, 2013 Applied Matrix Theory

Singular values of X

8,000

6,000 k σ 4,000

2,000

0 0 20 40 60 80 100 120 140 160 180 200 k

Figure 10.1. Singular values σk of matrix X versus k.

k=3 k=10

20 40

60 80

100 120

140 160

180

200

k=30 original

20 40

60 80

100 120

140

160

180 200 50 100 150 200 250 300 50 100 150 200 250 300

Figure 10.2. Rank k approximations of original image.

147 Nitsche and Benner Unit 10. Singular Value Decomposition

148 UNIT 11

Additional Topics

11.1 Lecture 39 (cont.)

The Determinant We will quickly cover the essentials of chapter 6. The determinant is defined; Definition 11.1. X det(A) = σ(p)a1p1 a2p2 ··· anpn (11.1.1) p where p is the number of permutations of (1, . . . , n) → (p1, p2, . . . , pn). Also, σ(p) is the sign of the permutation,  +1, if even number of exchanges needed to obtain p from (1, . . . , n) σ(p) = (11.1.2) −1, if odd number of exchanges needed to obtain p from (1, . . . , n) If we have a non-zero determinant, then Ax = b has a unique solution. Theorem 11.2. We have several interesting properties of determinants. 1. Triangular matrices:   a11 a12 ··· a1n . n  0 a .. a  Y det  22 2n = a (11.1.3)  ......  ii  . . . .  i=1 0 0 ··· ann 2. det(A|) = det(A) 3. det(AB) = det(A) det(B). 4. If B is obtained for A by

• Exchange row i with row j; det(B) = det(A). • Multiply row i by α; det(B) = α det(A). • Add multiple of row i to row j; det(B) = det(A). 5. det(A) is a bilinear operator in the rows and columns of A

149 Nitsche and Benner Unit 11. Additional Topics

11.2 Lecture 40: December 2, 2013

Further details for class Homework due Friday, with latest it can possibly be turned in on Tuesday before 4:30 (to get solutions). Final is on Wednesday at 7:30–9:30. (?) Today we will cover eigenvalues and eigenvectors. Then on Wednesday we will cover positive-definite matrices. For Final, we will review on Friday. Some homework problems may definitely be ignored because they were too involved.

Diagonalizable Matrices We know that for any matrix, A ∼ B (11.2.1) means A = SBS−1. (11.2.2) Now we want to know when A ∼ D which is a diagonal matrix.

Eigenvalues and eigenvectors Say we have the eigen-pair (λ, v), when Av = λv, (11.2.3a) (A − λI) v = 0. (11.2.3b) which is only the case for v ∈ N(A − λI). Thus we care about det(A − λI) = 0. So,

a11 − λ a12 ··· a1n

a a − λ ··· a 21 22 2n det (A − λI) = . . .. . , (11.2.4a) . . . .

an1 an2 ··· ann − λ

= (a11 − λ)(a22 − λ) ··· (ann − λ) + powers of λ of degree ≤ n − 2, (11.2.4b) = p(λ), (11.2.4c) n n n−1 n−1 k = (−1) λ + (−1) λ (a11 + a22 + ··· + ann) + lower order terms in λ , k ≤ n − 2, | {z } tr(A) (11.2.4d) n = (λ − λ1)(λ − λ2) ··· (λ − λn)(−1) , (11.2.4e) n n n−1 = (−1) λ + (−1) (λ1 + λ2 + ··· + λn) + l.o.t., (11.2.4f) n  n n−1  = (−1) λ + λ (−λ1 − λ2 − · · · − λn) + l.o.t. , (11.2.4g) (11.2.4h) with the final step being from the fundamental theorem of algebra. From this we get the following:

150 11.2. Lecture 40: December 2, 2013 Applied Matrix Theory

• Every matrix A has n eigenvalues. P • The sum λk = tr(A). Q • λk = p(0) = det(A). Q • If A is triangular det(A − λiI) = (aii − λi) = 0 so the roots are simply the aii and λi = aii. Example 11.3. For a little reviewing find the eigenvalues and the eigenvectors of

1 −1 A = 1 1

So,

1 − λ −1 2 det(A − λI) = = (1 − λ) + 1, (11.2.5a) 1 1 − λ = λ2 − 2λ + 2, (11.2.5b) √ 2 ± 4 − 8 λ = , (11.2.5c) 1,2 2 = 1 ± i. (11.2.5d)

Then for λ1:  1 − (1 + i) −1 0   −i −1 0  (A − λI) v = 0 = , (11.2.6a) 1 1 − (1 + i) 0 1 i 0  1 −i 0  → , (11.2.6b) −i −1 0  1 −i 0  → , (11.2.6c) 0 0 0 (11.2.6d)

So,

v1 − iv2 = 0, (11.2.7a)

v1 = iv2, (11.2.7b)  i v = (11.2.7c) 1 1

Then

λ2 = 1 − i, (11.2.8a) −i v = (11.2.8b) 2 1

Note that the eigenvectors v1, v2 are linearly independent.

151 Nitsche and Benner Unit 11. Additional Topics

Note: If A has a linearly independent set of eigenvectors, then,  | | |  V = v1 v2 ··· vn| | |

is invertible and Avj = λjvj. Then, for a diagonal matrix D with the eigenvalues along the diagonal

(AV):j = (VD):j , (11.2.9a) AV = VD, (11.2.9b) A = VDV−1. (11.2.9c) So not all matrices are diagonalizable. Example 11.4. Given the matrix 1 1 A = , 0 1

has the double eigenvalue of 1; λ1 = λ2 = 1. So, 0 1 A − λI = , (11.2.10a) 0 0 dim(N(A − λI)) = 1, (11.2.10b) Thus there is only one eigenvector. Example 11.5. Given the matrix 1 0 A = , 0 1 has the double eigenvalue of 1; λ1 = λ2 = 1. But here, 0 0 A − λI = , (11.2.11a) 0 0 dim(N(A − λI)) = 2, (11.2.11b) and there are two linearly independent eigenvectors. 1 0 v = and v = . 1 0 2 1 Example 11.6. Any nilpotent matrix where Nk = 0 does not have a full set of eigenvalues. This is because, 0 ··· 0 0 1 0 0   A ∼  .. .. . (11.2.12a)  . . . 0 1 0

So λ1 = λ2 = ··· = λn = 0 and dim(N(A − λI)) = dim(N(A)) = 1.

152 11.2. Lecture 40: December 2, 2013 Applied Matrix Theory

Theorem 11.7. If A has n distinct eigenvalues, then the corresponding eigenvectors are distinct.

Proof. Assume that {vk} are linearly dependent. Then, we can write one of them as a P linearly independent subset of the other eigenvectors. Then, vk = j6=k cjvj where {vj} are linearly independent. Then, X (A − λkI) vk = (A − λkI) cjvj, (11.2.13a) j6=k

0 = λkvk − λkvk, (11.2.13b) X = cj (Avj − λkvj) , (11.2.13c) j6=k X = cj (λj − λk) vj, (11.2.13d) j6=k | {z } 6=0 X 0 = αj vj. (11.2.13e) |{z} 6=0

This however means that the set {vj} is linearly dependent. But this is a contradiction so the assumption is not possible. So {vj} are linearly independent.  Now if A = VDV−1 then,

Ak = VDV−1VDV−1 ··· VDV−1, (11.2.14a) = VDkV−1 (11.2.14b)

Similarly we can do a power series. This will be useful in solving systems of differential equations.

153 Nitsche and Benner Unit 11. Additional Topics

154 Index

backward substitution, 9 idempotent operator, 92 basic columns, 38 ill-posed, 20 basis, 56, 66, 84 induced norm, 104 bilinear operator, 149 inner product, 109 interpolation, 63 Cauchy–Schwarz inequality, 100 invariant subspace, 91 change of basis, 88 isometry, 116 column space, 58 complementary projector, 127 Laplace equation, 2 complimentary subspaces, 121 least squares, 69 condition number, 27, 48 left null space, 58 consistent system, 36 linear function, 39 linear system, 1 determinant, 149 linear transformation, 83 diagonal matrix, 150 action, 83, 87 differentiation, 86 linearly dependent, 66 direct sum, 121 linearly independent, 57, 63 eigenvalues, 150 lower triangular, 25 eigenvectors, vi, 150 lower triangular system, 5 elementary operations, 15 matrix form, 1 Euclidian norm, 19 matrix norm, 101 exams, 73, 74 minimization, 74 field, 55 modified Gramm–Schmidt, 116 finite difference, 2, 44 nilpotent matrix, 128 four fundamental subspaces, 58 nilpotent operator, 92 Frobeius norm, 101 nonbasic columns, 38 fundamental theorem of algebra, 65, 150 norm, 47, 99 geometric series, 46 normal equations, 71 Givens rotation, 118 null space, 58 Gramm–Schmidt orthogonalization, 112 operation count, 9 homogeneous solutions, 39 order, 3 Householder method, 121 orthogonal projector, 123 orthogonalization, 111 idempotent matrices, 122 orthonormal, 111

155 Nitsche and Benner Index orthonormal basis, 111 partial differential equations, 111 particular solution, 38 periodic boundary conditions, 44 perturbations, 42 pivoting, 19, 22 PLU factorization, 22 projection, 118

QR factorization, 114 rank, 61 reduced row echelon form, 35 reflection, 118 review, 140 rotation, 117 row echelon form, 31 row space, 58 self-similar, 89 Sherman–Morrison formula, 44 singular value decomposition, 131 singular values, 131 smallest upper bound, 102 spanning set, 56 sparsity, 18 submatrices, 26 subspaces, 67

Taylor series, 3 trace, 40 tridiagonal matrix, 18 tuple, 92

Van der Monde matrix, 63 vector form, 1 vector space, 56 well-posed, 20

156 Figures

1.1 Finite difference approximation of a 1D boundary value problem...... 2

2.1 One-dimensional discrete grids...... 10 2.2 Two-dimensional discrete grids...... 11

3.1 Plot of linear problems and their solutions...... 21

4.1 Geometric illustration of linear systems and their solutions...... 36 4.2 Figures for Textbook problem 3.3.4...... 51

5.1 Basis vector of example solution...... 57 5.2 Interpolating system...... 64

6.1 Minimization of distance between point and a plane...... 73 6.2 Parabolic fitting by least squares ...... 73

7.1 Figure 4.7.4 ...... 95

10.1 Singular values σk of matrix X versus k...... 147 10.2 Rank k approximations of original image...... 147

157 Nitsche and Benner Figures

158 Tables

3.1 Variation of error with the perturbation variable ...... 20

10.1 Relative error of SVD approximation matrix Ak ...... 146

159 Nitsche and Benner Tables

160 Listings

2.1 code stub for tridiagonal solver ...... 13 10.1 svdimag.m ...... 145

161