<<

“notes2” i i 2013/2/20 page 29 i i

Chapter 2 Subspaces

Reminder: Unless explicitly stated we are talking about finite dimensional vector spaces, and linear transformations and operators between finite dimensional vector spaces. However, we will from time to time explicitly discuss the infinite dimensional case. Similarly, although much of the theory holds for vector spaces over some field, we focus (for practical purposes) on vector spaces of the real of complex field.

Definition 2.1. Let V be a (real or complex) and L : V → V be a linear operator over V . We say a vector space W ⊆ V is an of L if for every w ∈ W , Lw ∈ W (we also write LW ⊆ W ).

Note that V , {0} (the containing only the zero vector in V ), Null( L), and Range( L) are all invariant subspaces of L.

Exercise 2.1. Prove the statement above.

Theorem 2.2. Let V and L be as before, and let W1, W2, W3 be invariant subspaces of L. Then (1) W1 + W2 is an invariant subspace of L, (2) (W1 + W2) + W3 = W1 + ( W2 + W3), (3) W1 + {0} = {0} + W1.

Exercise 2.2. Prove theorem 2.2 . (The set of all invariant subspaces of a linear operator with the binary of the sum of two subspaces is a semigroup and a monoid).

Exercise 2.3. Prove that the sum of invariant subspaces is commutative.

If an invariant subspace of a linear operator, L, is one-dimensional, we can

29 “notes2” i i 2013/2/20 page 30 i i

30 Chapter 2. Invariant Subspaces

say a bit more about it, and hence we have a special name for non-zero vectors in such a space.

Definition 2.3. We call a nonzero vector v ∈ V an eigenvector if Lv = λv for some scalar λ. The scalar λ is called an eigenvalue. The (ordered) pair, (v, λ ) is called an eigenpair. (Note: this is the generalization from Chapter 1 to cover any linear operator.)

Exercise 2.4. Given a one-dimensional invariant subspace, prove that any nonzero vector in that space is an eigenvector and all such eigenvectors have the same eigen- value.

Vice versa the span of an eigenvector is an invariant subspace. From Theo- rem 2.2 then follows that the span of a set of eigenvectors, which is the sum of the invariant subspaces associated with each eigenvalue, is an invariant subspace.

Example 2.1. As mentioned before, the A ∈ Rn×n defines a linear operator 1 2 3 1 over Rn. Consider the real matrix A =  4 5 6  and vector v =  −2 . 7 8 9 1 Then Av = 0 = 0 · v. Hence v is an eigenvector of Aand 0 is an eigenvalue of A. The pair (v, 0) is an eigenpair. Note that a matrix with an eigenvalue 0 is singular.

Example 2.2. The following is an example for an infinite dimensional subspace. Let C∞[a, b ] be the set of all infinitely differentiable real functions on the (closed) interval [a, b ]. We define the of two functions f, g ∈ C∞[a, b ] by h = f + g is the function h(x) ≡ (f + g)( x) = f(x) + g(x) (for x ∈ [a, b ]), and for all α ∈ R and f ∈ C∞[a, b ] we define h = αf by h(x) ≡ (αf )( x) = αf (x). Then C∞[a, b ] with this definition of scalar and vector addition is a vector space (show this). ∞ Let L be defined by Lu = uxx for u ∈ C [a, b ]. Then L is a linear operator over C∞[a, b ] and (sin ωx, −ω2) is an eigenpair for any ω ∈ R.

We have Lv = λv ⇔ L v − λv = 0. We can rewrite the right-hand side of the last expression as ( L − λI)v = 0. Since v 6= 0, the operator ( L − λI) is singular (i.e. not invertible). Although in some cases the eigenvalues and eigenvectors of a linear operator are clear by inspection, in general we need some procedure to find them. As all linear operators over finite dimensional subspaces can be represented as matrices , all we need is a systematic procedure to find the eigenvalues and eigenvectors of matrices (we get our answer for the original operator by the corresponding linear combinations of vectors). Remember that the function that maps linear trans- formations between finite dimensional vector spaces given bases for the spaces to matrices is an isomorphism (i.e. an invertible linear transformation). Given a basis B, let A = [ L]B. Using the linearity of the standard transfor- mation from operator to matrix (given a basis), we also have A − λI = [ L − I ]B “notes2” i i 2013/2/20 page 31 i i

31

and the matrix A − λI must be singular. Hence, det (A − λI) = 0

Definition 2.4. The det (A−λI) is called the characteristic polynomial of A. The (polynomial) equation det (A − λI) = 0 in λ is called the characteristic equation for A (and for L!). The eigenvalues of A (and of L) are the roots of this characteristic equation. The multiplicity of an eigenvalue as a root of this equation is called the algebraic multiplicity of that eigenvalue.

IMPORTANT!! It may seem from the above that eigenvalues depend on the choice of basis, but this is not the case! To show that this is the case, we need only pick two different bases for V , and show that the corresponding matrix of the transformation for one basis is similar to the matrix of the transformation for the other basis. Let A = [ L]B. Let C also be a basis for V and define B = [ L]C . −1 Furthermore, let X = [ I]C←B . Then we have A = X BX (recall that this is called a transformation between A and B).

Theorem 2.5. A and B as defined above have the same eigenvalues.

Proof. From A = X−1BX we see that A − λI = X−1BX − λI = X−1(B − λI)X. Hence det (A − λI) = det (X−1)det (B − λI)det (X) and det (A − λI) = 0 ⇔ det (B − λI) = 0.

So, it is indeed fine to define the eigenvalues of an operator over a vector space using any basis for the vector space and its corresponding matrix . In fact, as the characteristic polynomial of an n × n matrix has leading term of ( −1) nλn (check this), the characteristic of A and B are equal. So, we can call this polynomial the characteristic polynomial of L without confusion. Note that the proof above does not rely on the fact that A and B are representations of the (same) linear operator L, only that A is obtained from a similarity transformation of B. So, this is a general result for similar matrices. We say that similarity transformations preserve eigenvalues . The standard methods for computing eigenvalues of any but the smallest matrices are in fact based on sequences of cleverly chosen similarity transformations the limit of which is an uppertriangular (general case) or diagonal matrix (Hermitian or real symmetric case). Take a course in Numerical Linear for more details! Eigenvectors of matrices are not preserved under similarity transformations, but they change in a straightforward way. Let A and B be as above and let Av = λv. Then, X−1BXv = λv ⇔ B(Xv ) = λ(Xv ), so Xv is an eigenvector of B corresponding to λ. If we are interested in computing eigenvectors of an operator L, then again the choice of basis is irrelevant (at least in theory; in practice, it can matter a lot). Let A and B be representations of L with respect to the bases B = {v1, v2, ... vn} and C = {w1, w2, ... wn} as above, with the change of coordinate matrix X = [ I]C←B . By some procedure we obtain Au = λu, which corresponds to B(Xu ) = λ(Xu ). n Define y = i=1 uivi (so that u = [ y]B), then we have [ Ly]B = [ λy]B, which implies (by theP standard isomorphism) that Ly = λy. However, we also have “notes2” i i 2013/2/20 page 32 i i

32 Chapter 2. Invariant Subspaces

[y]C = [ I]C←B [y]B = Xu . This gives [ Ly]C = [ λy]C ⇔ B(Xu ) = λ(Xu ). So, computing eigenvectors of B leads to the same eigenvectors for L as using A.

Theorem 2.6. A linear operator, L, is diagonalizable if and only if there is a basis for the vector space with each basis vector an eigenvector of L. A matrix An×n is diagonalizable if and only if there is a basis for Rn (respectively Cn) that consists of eigenvectors 5 of A.

We often say A diagonalizable if there exists an invertible matrix U such that U−1AU is diagonal. But clearly, if we set D = U−1AU , we rearrage, we get

AU = UD ⇒ [Au 1,..., Au n] = [ d11 u1, d 22 u2,...,d nn un]

⇒ Au i = dii ui, i = 1 ,...,n

and since the ui cannot be zero (since U was assumed invertible), the columns of U must be eigenvectors and elements of D must be eigenvalues. Furthermore, you should make sure you are able to show that if L(x) = Ax and A is diagonalizable, then if you use the eigenvectors for the (only) basis on the right and left of the linear transformation/matrix transformation picture, you find that the matrix of the transformation is precisely the diagonal matrix containing the eigenvalues. As mentioned in the previous chapter, a matrix may not be diagonalizable. We now consider similarity transformations to block diagonal as an alternative. (Refer also to definition of block diagonal in Chapter 1)

Definition 2.7. Let A be a complex or real n × n matrix and let the numbers m1, s m2, ..., ms be given such that 1 ≤ mi ≤ n for i = 1 ,...,s and i=1 mi = n. i−1 i Furthermore, for i = 1 ,...,s let fi = 1 + j=1 mj (where f1 = 1 ), ` = j=1 mj , and Pi = {fi,...,` i}. We say that A is aP (complex or real) block diagonalP matrix with s diagonal blocks of sizes m1, m2, ..., ms, if its coefficients satisfy ar,c = 0 if r and c are not elements of the same index set Pi. (See note below).

Note that A is a block diagonal matrix if the coefficients outside the diagonal blocks are all zero. The first diagonal block is m1 ×m1, the second block is m2 ×m2, and so on. The first coefficient of block i has index fi; the last coefficient of block i has index `i.

Theorem 2.8. Let L be a linear operator over V , with dim( V ) = n, and with invariant subspaces V1, ..., V s, such that V1 V2 · · · Vs = V . Further, let (i) (i) there be bases v1 , ..., vmi for each Vi (for iL= 1 ,...,sL ),L and define the ordered (1) (1) (2) (2) (s) set B = {v1 , ..., vm1 , v1 , ..., vm2 , ..., vms } (i.e., B is basis for V ). Then 5Here again, if the matrix is real, we must be careful to specify whether or not we are considering the matrix transformation as a map on Rn or on Cn. If the former and the eigenpairs are not all real, then we are forced to conclude it is not diagonalizable with respect to Rn, even though it may be if we take it with respect to Cn. “notes2” i i 2013/2/20 page 33 i i

33

A = [ L]B is block-diagonal.

(i) Proof. (This is a sketch of the proof.) As each Vi is an invariant subspace, Lvj ∈ (i) mi (i) Vi. Hence, Lvj = k=1 αkvk . These coefficients correspond to columns fi, f i + 1,...,` i and the sameP rows. So, only the coefficients in the diagonal blocks of A, m1×m1 ms×ms A1,1 ,..., As,s can have nonzero coefficients.

Exercise 2.5. Write the proof in detail.

Note that a block-diagonal matrix (as a linear operator over Rn or Cn) reveals invariant subspace information quite trivially. Vectors with zero coefficients except corresponding to one diagonal block obviously lie within an invariant subspace. Moreover, bases for the invariant subspaces can be trivially found. Finally, finding eigenvalues and eigenvectors (generalized eigenvectors) for block diagonal matrices is greatly simplified. For this reason we proceed by working with matrices. However, the standard isomorphism between the n-dimensional vector space V and Cn) (or Rn) given a basis B for V guarantees that the invariant subspaces we find for the matrix [ L]B correspond to invariant subspaces for L, as we observe in the following theorem.

Theorem 2.9. Let A ∈ Cn×n be a block diagonal matrix with s diagonal blocks of i−1 sizes m1, m2, ..., ms. Define the integers fi = 1 + j=1 mj (where f1 = 1 ) and i PCn `i = j=1 mj for i = 1 ,...,s . Then the subspaces (of ) P Cn Vi = {x ∈ |xj = 0 for all j < f i and j > ` i} = Span( efi , efi+1 , ..., e`i ) for i = 1 ,...,s are invariant subspaces of A.

Proof. The proof is left as an exercise.

Exercise 2.6. Prove Theorem 2.9 .

Using the previous theorem we can also make a statement about invariant subspaces of L if the matrix representing L with respect to a particular basis is block diagonal.

Theorem 2.10. Let L be a linear operator over V , with dim( V ) = n and let the ordered set B = {v1, v2, ..., vn} be a basis for V . Furthermore, let A = [ L]B be block-diagonal with block sizes m1, m 2, . . . , m s (in that order). Let fi and `i for i = 1 ,...,s be as defined in Definition 2.7 .

Then the subspaces (of V ) V1, ..., V s, defined by Vi = Span( vfi , vfi+1 ,..., vmi ) are invariant subspaces of L and V1 V2 · · · Vs = V . L L L Proof. The proof is left as an exercise. “notes2” i i 2013/2/20 page 34 i i

34 Chapter 2. Invariant Subspaces

Exercise 2.7. Prove Theorem 2.10 .

Procedure to find invariant SS for L 1. Get the matrix representation of L first. That is, pick a basis B for V and let A = [ L]B. 2. Find an invertible matrix S such that F := S−1AS is block diagonal with blocks satisfying certain nice properties (nice is TBD). 3. The columns of S will span the invariant subspaces of A ( the columns according to the block sizes of F, as we’ve been doing in the preceding dis- cussion). 4. Use S and B to compose the invariant subspaces (in V ) for L. Now, Step 2 is non-trivial, but we’ll put this off and just assume it can be done. What remains is HOW do we finish Step 4? The following two theorem addresss this issue.

Theorem 2.11. Let L be a linear operator over a complex n-dimensional vector space V , and let B = {b1, b2,..., bn} be a (arbitrary) basis for V . Let A = [ L]B and let F = S−1AS , for any invertible matrix S ∈ Cn×n, be block diagonal with i−1 block sizes m1, m 2, ..., m s. Furthermore, let fi = 1 + j=1 mj (f1 = 1 ) and i P `i = j=1 mj , and let the ordered basis C = {c1, c2, ..., cn} be defined by ci = n j=1 Pbj sj,i for i = 1 , . . . n . Then the spaces Vi = Span( cfi , ..., c`i ) are invariant Psubspaces of L and V = V1 V2 · · · Vs. L L L Proof. The proof is left as an exercise. Hint: consider I ◦L◦I , and note that with this definition of the C basis, S = [ I]C←B .

We will now consider in some more detail a set of particularly useful and revealing invariant subspaces that span the vector space. Hence, we consider block- diagonal matrices of a fundamental type. Most of the following results hold only for complex vector spaces, which are, from a practical point of view, the most important ones. Next we provide some links between the number of distinct eigenvalues, the number of eigenvectors, and the number of invariant subspaces of a matrix A ∈ Cn×n (working over the complex field).

Theorem 2.12. Let λ be an eigenvalue of A (of arbitrary algebraic multiplicity). There is at least one eigenvector v of A corresponding to λ.

Proof. Since A − λI is singular, there is at least one nonzero vector v such that (A − λI)v = 0.

(i) (i) Theorem 2.13. Let λ1, λ 2,...,λ k be distinct eigenvalues and let v1 , ..., vmi be “notes2” i i 2013/2/20 page 35 i i

2.1. Toward a Direct Sum Decomposition 35

independent eigenvectors associated with λi, for i = 1 ,...,k . Then

(1) (1) (2) (2) (k) (k) {v1 , ..., vm1 , v1 , ..., vm2 , ..., v1 , ..., vmk }

is an independent set.

Proof. Left as an exercise for the reader (it’s in most textbooks.)

2.1 Toward a Direct Sum Decomposition In general the above set of eigenvectors does not always give a direct sum decompo- sition for the vector space V . That is, it is not uncommon that we will not have a complete set of n linearly independent eigenvectors. So we need to think of another way to get what we’re after. A complete set of independent eigenvectors for an n-dimensional vector space ( n independent eigenvectors for an n-dimensional vector space) would give n 1-dimensional invariant subspaces, each the span of a single eigenvector. These 1-dimensional subspaces form a direct sum decomposition of the vector space V . Hence the representation of the linear operator in this basis of eigenvectors is a block diagonal matrix with each block size equal to one, that is, a diagonal matrix. In the following we try to get as close as possible to such a block diagonal matrix and hence to such a direct sum decomposition. We will aim for the following two properties. First, we want the diagonal blocks to be as small as possible and we want each diagonal block to correspond to a single eigenvector. The latter means that the invariant subspace corresponding to a diagonal block contains a single eigenvector. Second, we want to make the blocks as close to diagonal as possible. It turns out that bidiagonal, with a single nonzero diagonal right above (or below) the main diagonal (picture?) is the best we can do. In the following discussion, polynomials of matrices or linear operators play an important role. Note that for a matrix A ∈ Cn×n the matrices A2 ∈ Cn×n, A3 ∈ Cn×n, etc. are well-defined and that Cn×n (over the complex field) is itself a vector space. n Hence, polynomials (of finite degree) in A, expressed as α0I+α1A+· · · +αnA are well-defined as well and, for fixed A, are elements of Cn×n. Note there is a difference between a polynomial as a function of a free variable of a certain type and the evaluation of a polynomial for a particular choice of that variable. Similarly, for linear operators over a vector space, L : V → V , composition of (or product of) the operator with itself, one or more (but finite) times, results in another linear operator over the same space, ( L)m ≡ L◦ (Lm−1) and ( L)m : V → V . Indeed, the set of all linear operators over a vector space V , (often expressed as L(V )) is itself a vector space (over the same field). (Exercise: Prove this!) A nice property of polynomials in a fixed matrix (or linear operator) is that they commute (in contrast to two general matrices).

Exercise 2.8. Prove this for linear matrix polynomials A − λI and A − µI and “notes2” i i 2013/2/20 page 36 i i

36 Chapter 2. Invariant Subspaces

linear operator polynomials L − λI and L − µI.

We will discuss this in greater detail later, but for the moment this is all we need. Remember, that in the following we consider the linear operator L over the (finite) n-dimensional vector space V with ordered basis B = {b1, b 2, ..., b n} and A = [ L]B. We start by using a matrix decomposition that is quite revealing. We show that every matrix is similar to an uppertriangular matrix with the eigenvalues on the diagonal in any desired order.

Theorem 2.14. Let A ∈ Cn×n. There exists a nonsingular S such that T = S−1AS is uppertriangular . T has the same eigenvalues as A with the same multi- plicity, and the eigenvalues of T (and A) can appear on the diagonal in any order.

Proof. Note that T = S−1AS is a similarity transformation, which proves that T and A have the same eigenvalues with the same multiplicities. It remains to prove that S exists with the desired properties. We prove this theorem by induction on n, the size of the matrix. We start with n = 2 to make the order of the eigenvalues meaningful. Let A ∈ C2×2. Since the characteristic polynomial is of degree two, A has two eigenvalues, and for each distinct eigenvalue A has at least one eigenvector. Let λ1 be the eigenvalue we want at position (1 , 1) on the diagonal and let x1 be a corresponding eigenvector, so that Ax 1 = λ1x1. Now let y2 be any vector such that the matrix ( x1 y2) is invertible. The existence of y2 follows from the fact that any set of independent vectors in a vector space can be extended to a basis for that vector space. Consider

−1 x1 y1 A x1 y1 = −1  x1 y1 x1λ1 Ay 1 =

−1  λ α  x y x y 1 = 1 1 1 1  0 A    1 λ α 1 b= T. (2.1)  0 A1  b So, for S = ( x1 y1) we get the desired similarity transformation with the desired eigenvalue λ1 in the leading position. For the 2 × 2 case the remaining submatrix A1 is the second eigenvalue. If A1 is distinct from λ1, we can replace x1 with an eigenvector corresponding to A and get A in the leading position of T. b 1 b 1 Now, assume that the theorem holds for matrices of dimensions 2 to n−1 (the b b induction hypothesis). Next, we prove that, in that case, it also holds for matrices of n. n×n Let A ∈ C and let λ1 be an eigenvalue of A, and we want λ1 in position (1 , 1) of T. Let x1 be an eigenvector corresponding to λ1. Then there exist vectors y2, ..., yn such that {x1, y2, ... yn} are independent. Define the matrix Y = Cn×(n−1) (y2 y3 . . . yn) ∈ . Then the matrix ( x1 Y) is invertible, and we proceed “notes2” i i 2013/2/20 page 37 i i

2.1. Toward a Direct Sum Decomposition 37

as we did for the 2 × 2 case,

−1 x1 Y A x1 Y = −1  x1 Y x1λ1 AY =  T  −1 λ1 (α) x1 Y x1 Y =  0 An−    1 λ (α)T 1 b , (2.2)  0 An−1  b where An−1 is an ( n − 1) × (n − 1) (sub)matrix. Note that λ(A) = λ1 ∪ λ(An−1), where λ(B) indicates the set of eigenvalues (or spectrum) of a matrix B. By the b (n−1) ×(n−1) b induction hypothesis, there exists a matrix Zn−1 ∈ C such that

−1 b Zn−1An−1Zn−1 = Tn−1.

Now let b b b b 1 0 Z = ,  0 Zn−1  then b

−1 −1 Z x1 Y A x1 Y Z = Tn. (2.3)   Note that the matrix S in the theorem is given by S = ( x1Y)Z. Both matri-

ces, ( x1Y) and Z are invertible by construction, and the product of two invertible matrices is invertible again.

n×n Corollary 2.15. Let A ∈ C have distinct eigenvalues λ1 with with multiplicity m1, λ2 with multiplicity m2, ..., and λs with multiplicity ms (where m1 + m2 + −1 · · · + m1 = n). There exists a similarity transformation S AS = T such that T is uppertriangular and equal eigenvalues of A are grouped together in diagonal blocks, with the first diagonal block of size m1 × m1 with λ1 on the diagonal, the second diagonal block of size m2 × m2 with λ2 on the diagonal, and so on.

T1,1 T1,2 . . . T1,s  O T 2,2 . . . T2,s  T = . . , (2.4)  O O .. .     O O . . . T   s,s 

mi×mi where Ti,i ∈ C is uppertriangular and its diagonal is λiImi .

Exercise 2.9. Prove that an uppertriangular matrix T ∈ Cn×n has invariant subspaces Span( e1), Span( e1, e2),..., Span( e1,..., en). “notes2” i i 2013/2/20 page 38 i i

38 Chapter 2. Invariant Subspaces

Next we derive a sequence of similarity transformations that will show that the matrix T is similar to a block diagonalmatrix with the mi × mi uppertriangular matrices Ti,i for i = 1 ,...,s on the diagonal.

Theorem 2.16. Let A ∈ Cn×n be the uppertriangular matrix,

L H A = ,  OM 

with (uppertriangular ) diagonal blocks L ∈ Cm×m (1 ≤ m < n ) and M ∈ C(n−m)×(n−m) such that λ(L) ∩ λ(M) = ∅, Then there exists a similarity trans- formation S−1AS such that

L O S−1AS = ,  OM 

Proof. Since we want to preserve the blocks L and M, we consider S of the form

I Q S = m ,  O I n−m 

m×m (n−m)×(n−m) where Im ∈ C and In−m ∈ C . Then

I −Q L H I Q S−1AS = m m  O I n−m   OM   O I n−m  L LQ − QM + H = ,  OM 

where the reader should verify that S−1 has the form used here. This matrix has the desired property if LQ − QM + H = O. It turns out that for L and M with the properties given (both uppertriangular and their spectra disjoint), for any H, we can pick (solve for) such an Q that this equation is satisfied. Write the equation in the following form

LQ − QM = −H,

and, exploiting the fact that L and M are uppertriangular , solve this equation column by column. For the first column we have

Lq 1 − q1m1,1 = −h1 ⇔ (L − m1,1Im)q1 = −h1.

Since m1,1 ∈ λ(M) ⇒ m1,1 6∈ λ(L), ( L − m1,1I) is nonsingular (and uppertrian- gular ) and we can solve the equation for q1. However, once q1 is known, we can consider the second column

Lq 2 − q2m2,2 = −h2 + q1m1,2 ⇔ (L − m2,2Im)q2 = −h2 + q1m1,2. “notes2” i i 2013/2/20 page 39 i i

2.1. Toward a Direct Sum Decomposition 39

Since m2,2 ∈ λ(M) ⇒ m2,2 6∈ λ(L), ( L − m2,2I) is nonsingular (and uppertrian- gular ) and we can solve the equation for q2. Now, assuming we have solved for columns 1 to k − 1, we proceed for the k-th column as follows k−1 k−1

Lq k − qkmk,k = −hk + qj mj,k ⇔ (L − mk,k Im)qk = −h2 + qj mj,k . Xj=1 Xj=1 Again since the matrix on the left hand side is nonsingular, we can solve the equation for k = 3 , 4,...,n −m. Hence, we can always obtain Q such that LQ −QM+H = O.

Note that it does not matter whether L or M is singular, as long as λ(L) ∩ λ(M) = ∅.

n×n Corollary 2.17. Let A ∈ C have distinct eigenvalues λ1 with with multiplicity m1, λ2 with multiplicity m2, ..., and λs with multiplicity ms (where m1 + m2 + −1 · · · + m1 = n). There exists a similarity transformation S AS = D such that D is block diagonaland uppertriangular , with the first diagonal block of size m1 × m1 with λ1I on the diagonal, the second diagonal block of size m2 × m2 with λ2I on the diagonal, and so on.

T1,1 O . . . O  O T 2,2 . . . O  D = . . ,  O O .. .     O O . . . T   s,s  mi×mi where Ti,i ∈ C is uppertriangular and its diagonal is λiImi .

Proof. According to Corollary 2.15 , we can use a similarity transformation to find an uppertriangular matrix T similar to A with equal eigenvalues grouped in diagonal blocks as given in ( 2.4 ). We now apply Theorem 2.16 with L = T1,1, H = ( T1,2 . . . T1,s ), and

T2,2 . . . T2,s  . .  M = . . ,  T . . . T   s, 2 s,s  and compute Q to zero the block row corresponding to H. Next, we apply Theo- rem 2.16 with T O L = 1,1 ,  O T 2,2  T . . . T H = 1,3 1,s ,  T2,3 . . . T2,s  and T3,3 . . . T3,s  . .  M = . . ,  Ts, . . . Ts,s   3  “notes2” i i 2013/2/20 page 40 i i

40 Chapter 2. Invariant Subspaces

and so on.

This decomposition reveals a number of important properties of the matrix A and special invariant subspaces as well as of any linear operator over a finite dimensional vector space.

Definition 2.18. Let L be a linear operator over the vector space V , and let U be an invariant subspace of L. We define the restriction of L to U as the linear operator LU : U → U such that for all u ∈ U LU u = Lu.

Exercise 2.10. Let L be a linear operator over V and let B = {v1, v2,..., vn} be a basis for V such that A = [ L]B is block diagonal with diagonal blocks A1,1 of size m1, A2,2 of size m2, ..., As,s of size ms (in that order) and let fi and `i be

defined as in definition 2.7 for i = 1 ,...,s . Let Bi = {vfi , vfi+1 ,..., v`i } and let

Vi = Span( Bi) for i = 1 ,...,s . Show that Ai,i = [ LVi ]Bi ].

Corollary 2.17 shows that every matrix has a decomposition A = SDS −1

that reveals (unique) invariant subspaces of dimension mi = aλi (the algebraic multiplicity) associated with each distinct eigenvalue. As every linear operator over a finite dimensional subspace with a given basis is uniquely associated with a matrix, we have the same conclusion about linear operators. We next discuss a few additional properties of matrices and linear operators that follows (almost) directly from this decomposition or the block diagonal form. To each distinct eigenvalue λi of A (L) there corresponds an invariant sub-

space, Vλi , of dimension aλi such that the restriction of A (L) to that invariant subspace has only the eigenvalue λi. There is no invariant subspace of larger di- mension with this property. There may be invariant subspaces of smaller dimension with this property, and finding those and characterizing them is the topic of the remainder of this chapter.

Note that each invariant subspace of dimension aλi associated with a distinct eigenvalue has at least one eigenvector. The same holds for any linear operator that, with respect to some given basis, has the matrix A as its representation. These invariant subspaces, Vi, associated with the distinct eigenvalues λi form a direct sum decomposition of the vector space Cn (or V for L). So, we have achieved part of our quest, a direct sum decomposition of the vector space over which L (or A) is defined in invariant subspace that are associated with the eigenvalues. The next two examples show that this may or may not be the best we can do. n n (perhaps we should explicitly identify A with the linear operator LA over R or C or treat the separately - I prefer the first) It shows that with every distinct eigenvalue λ ∈ λ(L) is associated an invariant subspace of dimension. Th The matrix A in the corollary above has invariant prove general possibility to turn block diagonal matrix with distinct spec- tra for each block into block diagonal matrix

Let L be a linear operator over the complex vector space V , and let U be a finite “notes2” i i 2013/2/20 page 41 i i

2.1. Toward a Direct Sum Decomposition 41

dimensional invariant subspace of L with dim( U) ≥ 1. Furthermore, let x ∈ U and let m be the smallest integer such that the set of vectors {x, Lx,..., Lmx} is dependent. So, {x, Lx,..., Lm−1x} is independent. Since, U is finite dimensional such m always exists and obviously m ≤ dim( U). Since U is an invariant subspace and x ∈ U, Lx ∈ U and (by induction) (L)kx ∈ U for any k ∈ N. The following two questions relate to the U, x and L in this and the previous paragraph.

Exercise 2.11. Prove that for any polynomial p(L) in L, and x in the U defined above, we have p(L)x ∈ U.

Exercise 2.12. Prove that for any polynomial pm−1(L) of degree at most m − 1, except the zero polynomial, pm−1(L)x 6= 0.

However, from the dependence of {x, Lx,..., Lmx} follows that there is a nontrivial m linear combination of these vectors that sums to zero: α0x+α1Lx+· · · , α mL x = 0. m This is equivalent with ( α0I + α1L + · · · , α mL )x = 0.

Theorem 2.19. Let L and U be as above. Then, there is a vector u ∈ U such that Lu = λu (for some λ). (Every invariant subspace of a complex linear operator contains at least one eigenvector).

Proof. Let x ∈ U and let m be the smallest integer such that the set of vectors {x, Lx,..., Lmx} is dependent. Hence, there is a nontrivial linear combination m k m k such that k=0 αkL x = 0 ⇔ ( k=0 αkL )x = 0. By the fundamental theorem of algebra 6 thisP is equivalent to P

αm(L − rmI)( L − rm−1I) · · · (L − r1I)x = 0 m k (for r1,...,r m such that αm(L − rmI)( L − rm−1I) · · · (L − r1I) = ( k=0 αkL )). From Exercise [ ?] above we know that ( L− rm−1I) · · · (L− r1I)x 6= 0,P and therefore (L − rm−1I) · · · (L − r1I)x is an eigenvector of L with eigenvalue rm.

In fact, this proof tells us a little more. Since the factors ( L − ri) commute, we also have

αm(L − rm−1I)( L − rmI)( L − rm−2I) · · · (L − r1I)x = 0,

so that ( L − rmI)( L − rm−2I) · · · (L − r1I)x must be an eigenvector of L with eigenvalue rm−1. Of course, this construction can be repeated to bring each factor

(L − riI) to the left to show that k=6 i(L − rkI) x is an eigenvector of L with   eigenvalue ri. As we have shown above,Q eigenvectors corresponding to distinct eigenvalues are independent. So, we have the following useful theorem.

Theorem 2.20. Let L and U be as above, and let x ∈ U and m be the smallest 6Each complex polynomial can be factored over the complex field. “notes2” i i 2013/2/20 page 42 i i

42 Chapter 2. Invariant Subspaces

integer such that the set of vectors {x, Lx,..., Lmx} is dependent. Then there exist αm and r1,...,r m such that

αm(L − rm−1I)( L − rmI)( L − rm−2I) · · · (L − r1I)x = 0.

Furthermore, each ri is an eigenvalue of L and for each distinct ri there exists ui ∈ U such that Lui = riui.

Proof. (see discussion above)

As we have seen above ( actually this needs to be discussed still ) the number of independent eigenvectors of L is the sum of the geometric multiplicities of each (distinct) eigenvalue. We have also seen that each invariant subspace contains at least one eigenvector. We would like to derive a set of invariant subspaces such that

1. their direct sum is the vector space V , so they can only intersect trivially,

2. they are fundamental ( better term! ) in that none can be split further into the direct sum of invariant subspaces.

To this end we consider invariant subspaces U such that U contains exactly one independent eigenvector. We are still assuming complex vector spaces.

Theorem 2.21. Let L be a linear operator over the complex vector space V . Let U be a finite dimensional invariant subspace of L with dim( U) = s and the property that for any two eigenvectors of L, u1, u2 ∈ U, {u1, u2} is dependent. Then there exists a vector x ∈ U and scalar λ such that (L − λI)sx = 0 and (L − λI)s−1x 6= 0.

The vector ( L− λI)s−1x is the unique independent eigenvector of L in U with eigenvalue λ.

Proof. The first part of the proof follows almost directly from the previous dis- cussion. For any vector y ∈ U, there is a smallest m ≤ s such that y, Ly,..., Lmy m is dependent and a nontrivial linear combination α0y + α1Ly + · · · + αmL y = 0. We can again factor the polynomial in L to obtain

αm(L − rmI)( L − rm−1I)( L − rm−2I) · · · (L − r1I)y = 0 .

However, according to theorem ??? every distinct ri yields an independent eigen- vector ui ∈ U. As we have by assumption a single eigenvector in U, the polynomial can only have one distinct root λ (with multiplicity m). So the polynomial equation must be m αm(L − λI) y = 0, m and as αm 6= 0 (why?), we have ( L − λI) y = 0. As proved in exercise ??? we also have that ( L − λI)m−1y 6= 0. In the second part of the proof we show that there is a vector x ∈ U such that m = s. Assume m < s (otherwise, we are done). As dim( U) = s > m , there “notes2” i i 2013/2/20 page 43 i i

2.2. Invariant Subspace Examples 43

must be a vector z such that {y, Ly,..., Lm−1y, z} is independent. Furthermore, there must be a smallest positive integer ` such that {z, Lz,..., L`} is dependent. As for y and m we have that ( L − λI)`z = 0 and ( L − λI)`−1z 6= 0. Both (L − λI)`−1z and ( L − λI)m−1y are eigenvectors and therefore (by assumption) (L − λI)`−1z = α(L − λI)m−1y. Now assume (first) that ` ≤ m. From the equation above we have that

`−2 m−2 m−1 (L − λI) z = α(L − λI) y + β1(L − λI) y, `−3 m−3 m−2 m−1 (L − λI) z = α(L − λI) y + β1(L − λI) y + β2(L − λI) y, . . m−` m−`+1 m−`+2 m−1 z = α(L − λI) y + β1(L − λI) y + β2(L − λI) y + · · · + β`−1(L − λI) y . But this yields a contradiction, since {y, Ly,..., Lm−1y, z} is independent. Hence ` > m must hold. If ` < s there must again be a vector z ∈ U such that {z, Lz,..., L`−1z, z} is independent, and we repeat the construction. As s is fi- nite, this repeated construction leads to the desired vector x. e e Now let the linear operator L over a complex vector space V have m indepen- dent eigenvectors u1,..., um. Then there are invariant subspaces Ui of L such that each Ui contains ui and this is the only independent eigenvector in Ui. Furthermore, dim( Ui) = si and si is maximum.

Theorem 2.22. U1 U2 · · · Um = V . L L L Proof. TBD

• applications: consider matrices from mass-spring systems, matrices from finite differences, matrices from finite elements, other simple examples, (partial) differential equations

2.2 Invariant Subspace Examples We now discuss a number of more extensive examples. The first involves simple differential equations

2.2.1 Differential Equations

∞ Consider again the space C [t0, t e] of infinitely differentiable (real) functions on ∞ [t0, t e], the function y ∈ C [t0, t e], and the simple differential equationy ˙ = λy for ∞ ∞ ∂ some real λ. If we define L : C [t0, t e] → C [t0, t e] by Ly ≡ ∂t y ≡ y˙ (verify that L is a linear operator over C∞[a, b ]), then the differential equation can be written as Ly = λy . So, solutions to this differential equation are eigenvectors of L with λt λt eigenvalue λ. The function u(t) = e with t ∈ [t0, t e] is a solution, and so is ae for any a ∈ R. Note that, as always with eigenvectors, the length is not relevant (except that it cannot be 0). Alternatively, eλt is a nontrivial solution to the singular linear “notes2” i i 2013/2/20 page 44 i i

44 Chapter 2. Invariant Subspaces

system ( L − λI)y = 0, called homogeneous equation in ODE terms. As always holds for solutions to homogeneous (linear) equations, any function ae λt , for any a ∈ R is also a solution. Note that these properties follow directly from the linear algebra structure of the problem: (1) (nonzero) length of eigenvectors is irrelevant and (2) any scalar multiple of a solution to an homogeneous linear equation is also a solution. To identify a unique solution we typically assume an initial condition (but a condition at another point in the interval would work just as well). Given an initial condition y(t0) = y0. We use this condition to fix the choice of a in the one-dimensional space of solutions ae λt . Next, we show that more complicated (coupled) systems of linear differential equations take a very simple and insightful form if we express the problem in the eigenvector basis. Consider the coupled system of linear differential equations. 4 1 2 y˙ = y, for t ∈ (0 , T ] with y(0) = .  1 4   0  In this case the linear differential equation is more difficult to solve as the equations for the components of y(t), y1(t) and y2(t), are coupled. We now show that in the eigenvector basis the equations become uncoupled and the solution for each component (in this basis) can be given directly, as above. The matrix has the T eigenvalues λ1 = 5, λ2 = 3 and corresponding eigenvectors v1 = [1 1] , and v2 = [1 − 1] T . Hence we have the eigendecomposition of this matrix: 4 1 1 1 5 0 1/2 1 /2 = ,  1 4   1 −1   0 3   1/2 −1/2  where −1 1/2 1 /2 1 1 = .  1/2 −1/2   1 −1  T Let the component vector of y(t) in the basis [ v1 v2] be [ α(t) β(t)] . Then the component vector ofy ˙ in the eigenvector basis is [α ˙ β˙], and the differential equation above becomes −1 1 1 α˙ 1 1 5 0 1 1 1 1 α = ⇔  1 −1   β˙   1 −1   0 3   1 −1   1 −1   β  1 1 α˙ 1 1 5 0 α = ⇔  1 −1   β˙   1 −1   0 3   β  α˙ 5 0 α = ,  β˙   0 3   β  and we’re back to two (uncoupled) equations that can be solved independently. Note that this is just the same differential equation written in the basis defined by the eigenvectors. Computing the decomposition of the y(0) along the eigenvectors gives the initial conditions for α and β. α(0) 1/2 1 /2 2 1 = = .  β(0)   1/2 −1/2   0   1  “notes2” i i 2013/2/20 page 45 i i

2.2. Invariant Subspace Examples 45

So, the solution for α is given by α(t) = e5t and the solution for β is given by β(t) = e3t. Multiplying each by the corresponding eigenvector gives the solution in the original basis (the canonical basis for R2). 1 1 α(t) e5t + e3t y(t) = = .  1 −1   β(t)   e5t − e3t  Note that the eigenvalue with largest real part gives the dominant component or mode of the solution. We now generalize this example for a diagonalizable matrix (we discuss the nondiagonalizable case in the next chapter). Consider the differential equation n n×n y˙ = Ay for t ∈ (0 , T ] and y(0) = y0 for y : [0 , T ] → R and A ∈ R . Note, that although A is a real matrix, we may need to factor the characteristic equa- tion of A over the complex field resulting (potentially) in complex eigenvalues and eigenvectors. Let the eigendecomposition of A be A = VΛV −1, where V is matrix with the (typically normalized) eigenvectors as columns and Λ is a diagonal matrix with the eigenvalues on the diagonal (in the order corresponding to the order of the eigen- vectors in V). Let y(t) in the eigenvector basis (the columns of V) be represented by η, that is, y(t) = Vη(t). Again substituting this and the eigendecomposition of A into the differential equation gives Vη˙ = VΛV −1Vη = VΛ η ⇔ η˙ = Λη.

λit This equation has componentwise solutions ηi(t) = η0,i e , where the initial values η0,i are obtained from −1 y0 = y(0) = Vη(0) = Vη0 ⇔ η0 = V y0.

λit 7 The solution in the eigenvector basis V is now given by η(t) = diag( e )η0. We obtain the solution in the standard basis again by multiplying by the matrix of basis λit −1 vectors (here, the eigenvectors) V: y(t) = Vdiag( e )V y0. The matrix Vdiag( eλit)V−1 is often denoted as eAt (called the matrix ex- ponential). This notation is inspired by the simple form that polynomials in di- agonalizable matrices take. If A = VΛV −1, then Ak = VΛ kV−1. Consider the n k polynomial p(t) = k=0 pkt applied to A. We have P 2 n p(A) = p0I + p1A + p2A + · · · + pnA (2.5) −1 −1 2 −1 n −1 = p0VV + p1VΛV + p2VΛ V + · · · + pnVΛ V (2.6) 2 n −1 = V p0I + p1Λ + p2Λ + · · · + Λ V (2.7) −1  = Vdiag( p(λk)) V . (2.8) Note that the eigenvectors of p(A) are the eigenvectors of A and the polynomial is simply applied to each eigenvalue individually. Noting that 1 1 eλkt = 1 + λ t + (λ t)2 + (λ t)3 + · · · , k 2 k 6 k 7 We use diag( x) or diag( xi) to indicate a diagonal matrix with the coefficients of x on the diagonal. “notes2” i i 2013/2/20 page 46 i i

46 Chapter 2. Invariant Subspaces

we see that

1 Vdiag( eλit)V−1y = V I + tΛ + (tΛ)2 + · · · V−1 (2.9) 0  2  1 = I + tA + (tA)2 + · · · , (2.10) 2

which is the polynomial that generates the exponential function (note that this sequence converges for any value of λkt). So, it’s the exponential function applied to tA, hence the notation eAt. We will discuss polynomials of matrices and applications of these in more detail later. In many cases, we are not so much interested in the details of the solution y(t) given above, but only in the simple question (simple to ask) whether y(t) → 0 for t → ∞, or whether components of y may grow without bound or remain bounded while not going to zero. This addresses the question of stability of a . We will discuss some aspects here briefly and come back to the issue in more detail after defining how to measure magnitudes or lengths of vectors in a vector space. In the example given above of a system of two coupled ordinary differen- tial equationswe see that both solution components blow up for increasing t, y = 1 1 e5t + e3t. However, for large t it is the component (1 1) T e5t that  1   −1  dominates. This is easily seen by writing the solution as

1 + e−2t y(t) = e5t .  1 − e−2t 

This is generally the case assuming that the initial solution has a component in the direction of the eigenvector with the largest real part. Consider the autonomous dynamical systemy ˙ (t) = f(y(t)), with y(0) = y0, n n n where y(t) : (0 , ∞] → R and f : R → R . If f(y0) = 0, there is no change, y˙ = 0, and the system is in steady (or stationary) state, also called in equilibrium. Stability addresses what happens if we perturb the system slightly from this steady state. If we perturb the state (sufficiently) slightly in an arbitrary direction, will the system return to the equilibrium (equilibrium is asymptotically stable), will the system stay close to the equilibrium but not necessarily return to the equilibrium (equilibrium is stable), or will the system more further and further away from the equilibrium (equilibrium is unstable). Consider a perturbation from y0 and let the perturbed state of the system be represented byy ˜ (t) = y(t) + ε(t) = y0 + ε(t). Then, assuming the Jacobian of f is Lipschitz continuous (details later), we have

n ˙ 2 y˜ =ε ˙ = f(y0 + ε(t)) = f(y0) + J(y0)ε(t) + O( εi ), (2.11) i,jX=1 “notes2” i i 2013/2/20 page 47 i i

2.2. Invariant Subspace Examples 47

where

∂f 1(y) ∂f 1(y) ∂f 1(y) ∂y 1 ∂y 2 . . . ∂y n  ∂f 2(y) ∂f 2(y) ∂f 2(y)  ∂y 1 ∂y 2 . . . ∂y n J(y) =  .   .   ∂f n y ∂f n y ∂f n y   ( ) ( ) . . . ( )   ∂y 1 ∂y 2 ∂y n 

n 2 We can discuss the nonlinear term O( i,j =1 ηi ) in more detail in chapter 4, after we introduce norms. However, it shouldP be clear that the analysis of stability depends first of all on the properties of the Jacobian J (at the equilibrium), as for small ek the nonlinear term will be very small. Assuming, in addition, that the Jacobian is diagonalizable, we consider the system of first order equations e˙ = Je (so, we assume the nonlinear term is negligible for small enough e) with initial condition e(0) = e0, and proceed as above. Let J = VΛV −1 be the eigendecomposition of the Jacobian, and let the representation or coordinate vectors of e in the basis V be given by λkt η, e = Vη. Then the solution components for η are given by ηk(t) = e η0,k . Since we are interested in (in)stability under arbitrary perturbations, we assume all components of η0 are nonzero. Hence, for all components of η to go to zero (asymptotic stability), we need that Re( λk) < 0 for all k = 1 ,...,n . For stability, we need that no components grow while some (possibly all, possibly none) decay. Hence, we need that Re( λk) ≤ 0 for all k = 1 ,...,n . Components with zero real part of the eigenvalue λk are balancing on the boundary of instability, and in this case the effect of the nonlinear term in ( 2.11 ) will play a part. Finally, for instability we need that at least one component grows without bound, which means there is at least one k such that Re( λk) > 0. We will show later that for systems that satisfy the conditions given for this autonomous system, the analysis of the linear component is actually sufficient for both asymptotically stable and unstable equilibria. For the linear stable but not asymptotically stable case, analysis of the nonlinear term is required. It is important to realize that in many problems of practical importance the ‘finite’ dimension n can be very, very large, and computing the complete eigen- decomposition would be a hopeless task (and when still doable, often extremely expensive). However, our analysis shows that this is not necessary. All we need to know are the eigenvalues with real part near zero or (much) larger than zero. In practice, this means that we need only compute a modest number of eigenvalues of a very large matrix. That turns out to be a manageable task. We will discuss this briefly in section (Krylov methods for linear systems and eigenvalue problems). For detailed information, see books on numerical analysis or large-scale eigenvalue problems. Should we collect references somewhere?

Discretizing a Differential Equation We could also put this earlier Next we consider a numerical approach, finite difference discretization, to an- alyze or solve differential equations that is much more generally applicable than “notes2” i i 2013/2/20 page 48 i i

48 Chapter 2. Invariant Subspaces

symbolic approaches. In order to emphasize similarities and differences with sym- bolic approaches we consider, at first, a very simple equation. Discretization of PDEs is one of the major sources of practical matrix problems, and we can use this to provide us with meaningful (higher dimensional) linear operators and (larger) matrices to demonstrate the theory introduced and show how this theory is used solving very important problems in science and engineering. After introducing inner products and related theory (linear and bilinear forms), we will discuss another ap- proach to discretization, finite elements, that relies even more on the linear algebra. In deriving effective discretization and numerical methods various concerns regarding convergence and error bounds are important, but they are more in the domain of analysis, and we do not consider them here. An advanced numerical analysis course would answers these questions. First we prove a result that will help us put finite difference discretization in perspective for these notes.

Theorem 2.23. The composition of two linear transformations, L : U → V and M : V → W , (M ◦ L ) : U → W , defined as (M ◦ L )u = M(Lu) (for u ∈ U) is a linear transformation. Analogously, for two matrices MB ∈ Cm×n and A ∈ C`×m the composition AB ∈ C`×n defines a linear transformation.

Proof. The proof is left as an exercise.

Exercise 2.13. Prove theorem 2.23 . Prove that the composition of a sequence of linear transformations Ln ◦ L n−1 ··· ◦ L 1, where Li : Vi → Vi+1 is again a linear transformation.

Consider the partial differential equation ut = uxx + f(x) for t ∈ (0 , T ] and x ∈ [0 , 1] (the heat equation) or, slightly more general if the rate of heat conduction is not constant, ut = ( a(x)ux)x + f(x), and in two spatial dimensions plus time, ut = ∇ · (a(x, y )( ∇u) + f(x, y ), where ( x, y ) ∈ [0 , 1] × [0 , 1]. We can rewrite these equations in the form of linear operator equations. For the first equation, we get 2 (∂t − ∂x)u = f. To discretize the first equation we consider the unknown values of the function u(x, t ) at the set of grid points (xj , t n defined by xj = j∆x for j = 0 ,...,J and ∆x = 1 /J , and tn = n∆t, for n = 0 , . . . , N and ∆ t = 1 /N . Define the vector un = ( u(x0, n ∆t), u (x1, n ∆t),...,u (xJ , n ∆t)).

Exercise 2.14. Show thatb mapping L(u) = un (for given n) is a linear mapping.

This process is called sampling (especiallyb in terms of signals and pictures). The same holds for sampling for a set of arbitrary ( x, t ) points (an irregular grid/sampling). Occasionally we will also consider a matrix u of u values in space and time with uj,n = u(j∆x, n ∆t) or the vector uk = u(j∆x, n ∆t) with k = ( n − 1) ∗ J + j (just a linear reordering of the matrix above). b b The next step in finite differenceb discretization is to replace all derivatives by “notes2” i i 2013/2/20 page 49 i i

2.2. Invariant Subspace Examples 49

linear combinations of (unknown) function values on the regular grid. The point being that after this only the unknown function values are unknowns in the discrete problem, and we do not need the values of the various derivatives of u on the grid. We define a number of finite difference operators and the reader should verify that (1) these finite difference operators are linear operators and that (2) other (more elaborate) finite difference operators can be defined similarly. Given ∆ t and ∆x, we define the following forward and backward differences in space (x) and time (t)

∆+xu(x, t ) = u(x + ∆ x, t ) − u(x, t ), (2.12)

∆−xu(x, t ) = u(x, t ) − u(x − ∆x, t ), (2.13)

∆+tu(x, t ) = u(x, t + ∆ t) − u(x, t ), (2.14)

∆−tu(x, t ) = u(x, t ) − u(x, t − ∆t), (2.15) (2.16)

and the following central differences in space and time

δxu(x, t ) = u(x + (1 /2)∆ x, t ) − u(x − (1 /2)∆ x, t ), (2.17)

∆0xu(x, t ) = u(x + ∆ x, t ) − u(x − ∆x, t ), (2.18)

δtu(x, t ) = u(x, t + (1 /2)∆ t) − u(x, t − (1 /2)∆ t), (2.19)

∆0tu(x, t ) = u(x, t + ∆ t) − u(x, t − ∆t). (2.20) (2.21)

We can compose further difference operators from these. An important one is

2 (δx)u(x, t ) = δx(δxu) = δx(u(x + (1 /2)∆ x, t ) − u(x − (1 /2)∆ x, t )) = ( u(x + ∆ x, t ) − u(x, t )) − (u(x, t ) − u(x − ∆x, t )) = u(x + ∆ x, t ) − 2u(x, t )) + u(x − ∆x, t ). (2.22)

Using Taylor approximation (assuming that u has enough derivatives) we can show that these difference operators (with a proper weighting) provide approximations −1 to derivatives. For ∆ x ∆+xu(x, t ) we have

−1 −1 ∆x ∆+xu(x, t ) = ∆ x (u(x + ∆ x, t ) − u(x, t )) (2.23) 1 = ∆ x−1(u(x, t ) + ∆ xu (x, t ) = ∆x2u (˜x, t ) − u(x, t ))(2.24) x 2 xx 1 = u (x, t ) + ∆xu (˜x, t ), (2.25) x 2 xx for somex ˜ ∈ (x, x + ∆ x). “notes2” i i 2013/2/20 page 50 i i

50 Chapter 2. Invariant Subspaces

2 2 The weighted finite difference operator (1 /∆x )δxu(x, t ) approximates the second derivative in x (below u, ux, etc are evaluated at ( x, t )),

2 δxu(x, t ) = ( u(x − ∆x, t ) − 2u(x, t )) + u(x + ∆ x, t )) ∆x2 ∆x3 ∆x4 = ( u − ∆x u + u − u + u (˜x , t ) + x 2 xx 6 xxx 24 xxxx 1 ∆x2 ∆x3 ∆x4 u + ∆ x u + u + u + u (˜x , t ) − x 2 xx 6 xxx 24 xxxx 2 2u) 2 4 = ∆ x uxx + O(∆ x ).

Dividing by ∆ x2 gives

1 δ2u(x, t ) = u x + O(∆ x2). (2.26) ∆x2 x x

Evaluating these finite differences at the grid points results in equations involving only the unknown values of u(x, t ) at the grid points, u(j∆x, n ∆t). Consider now the simple equation ut = uxx for t ∈ (0 , T ] and x ∈ (0 , 1), with boundary conditions u(x, 0) = u0(x) (initial condition) and u(0 , t ) = 0 and u(1 , t ) = 0. We discretize as discussed above, with the values uj, 0 ≡ u(j∆x, 0) = u0(j∆x) given and u0,n = u(0 , n ∆t) = 0 and uJ,n = u(1 , n ∆t) = 0. We use a forward finite b 2 2 difference to approximate ut and the second order central difference ((1 /∆x )δxu) to approximateb uxx . We evaluate theb expression ut = uxx at the point ( j∆x, n ∆t) and write out the approximations.

u(j∆x, n ∆t + ∆ t) − u(j∆x, n ∆t) = ∆t 1 (u(j∆x − ∆x, n ∆t) − 2u(j∆x, n ∆t) + u(j∆x + ∆ x, n ∆t)) ⇔ ∆x2 u − u j,n +1 j,n = ∆ x−2 (u − 2u + u ) ⇔ (2.27) ∆t j,n j,n j+1 ,n b b ∆t u = u + (ub − 2ub + ub ) . (2.28) j,n +1 j,n ∆x2 j,n j,n j+1 ,n b b b b b Since uj, 0 is given, we have a simple update formula that lets us compute the values of uj,n +1 for j = 1 ,...,J from those of uj,n for for j = 1 ,...,J . Web can also write this as a matrix iteration. We have b b ∆t u = u + Au , (2.29) n+1 n ∆x2 n b b b where un contains the values uj,n for j = 1 ,...,J − 1 (so excluding the end points where the solution is given by the boundary conditions), and A is the tridiagonal matrixb containing the coefficientsb of ( 2.28 ) in each row, (empty spaces indicate “notes2” i i 2013/2/20 page 51 i i

2.2. Invariant Subspace Examples 51

zeros) −2 1  1 −2 1  1 −2 1   A =  . . .  .  ......     1 −2 1     1 −2   