Quick viewing(Text Mode)

Introduction to Linear Algebra Tyrone L. Vincent

Introduction to

Tyrone L. Vincent

Engineering Division, Colorado School of Mines, Golden, CO E-mail address: [email protected] URL: http://egweb.mines.edu/~tvincent

Contents

Chapter 1. Revew of Vectors and Matricies 1 1. Useful Notation 1 2. Vectors and Matricies 2 3. Basic operations 3 4. Useful Properties of the Basic Operations 6 Chapter 2. Vector Spaces 9 1. De…nition 9 2. and 10 3. 12 4. Norms, Dot Products, Orthonormal Basis 14 5. QR Decomposition 16 Chapter 3. Projection Theorem 21 Chapter 4. Matrices and Linear Mappings 23 1. Solutions to Systems of Linear Equations 24 Chapter 5. Square Matrices, Eigenvalues and Eigenvectors 25 1. Exponential 26 2. Other Matrix Functions 26 Appendix A. Appendix A 29

iii

CHAPTER 1

Revew of Vectors and Matricies

1. Useful Notation 1.1. Common Abbrivations. In this course, as in most branches of math- ematics, we will often utilize sets of mathematical objects. For example, there is the set of natural numbers, which begins 1; 2; 3; : This set is often denoted N,    so that 2 is a member of N but  is not. To specify that an object is a member of a set, we use the notation for "is a member of". For example 2 N: Some of the sets we will use are 2 2

R real numbers C complex numbers Rn n dimensional vectors of real numbers m n R  m n dimensional real matrices  For these common sets, particular notation will be used to identfy members, namely lower case for a or vector, and upper case for a matrix. The following table also includes some common operations

x vector or scalar x; y inner product between vectors x and y Ah i matrix AT of A 1 A inverse of A det(A), A of A j j To specify a set, we can also use a bracket notation. For example, to specify E as the set of all positive even numbers, we can say either

E = 2; 4; 6; 8; f    g when the pattern is clear, or use a : symbol, which means "such that":

E = x N : mod(x; 2) = 0 : f 2 g This can be read "The set of natural numbers x, such that x is divisible evenly by 2". When talking about sets, we will often want to say when a property holds for every member of the set, or for at least one. In this case, the symbol ; meaning 8 "for all" and ; meaning "there exists" are useful. For example, suppose I is the set numbers consisting9 of the IQs for people in this class. Then

x I x > 110 8 2 1 2 1.REVEWOFVECTORSANDMATRICIES means that all students in this class have IQ greater than 110 while x I : x > 110 9 2 means that at least one student in the class has IQ greater than 110. We will also be concerned with functions. Given a set X and a set Y a function f from X to Y maps an element from X to and element of Y and is denoted f : X Y: ! The set X is called the domain, and f(x) is assumed to be de…ned for every x X: The range, or image of f is the set of y for which f(x) = y for some x : 2 Range(f) = y Y : x X such that y = f(x) : f 2 9 2 g If Range(f) = Y; then f is called "onto". If there is only one x X such that y = f(x); then f is called "one to one". 2

2. Vectors and Matricies You are probably already familiar with vectors and matrices from previous courses in mathmatics or physics. We will …nd matrices and vectors very useful when representing dynamic systems mathematically, however, we will need to be able to manipulate and understand these objects at a fairly deep level. Some texts use bold face for vectors and matrices, but ours does not, and I will not use that convention here, or during class. I will however, use lower case letters for vectors and upper case letters for matrices. A vector of n tuple real (or sometimes complex) numbers is represented as: x1 x2 x = 2 . 3 . 6 7 6 x 7 6 n 7 4 5 n So that x is a vector, and xi are each scalars. We will use the notation xi R to n 2 show that x is a length p vector of real numbers (or xi C if the elements of xi are complex.) Sometimes we will want to index vectors as2 well, which can sometimes be confusing: Is xi the vector xi or the ith element of the vector x? To make the di¤erence clear, we will reserve the notation [x]i to indicate the ith element of x: As an example, consider the following illustration of addition and for vectors:

[x1]1 + [x2]1 [x1]1 [x1]2 + [x2]2 [x1]2 x1 + x2 = 2 . 3 x1 = 2 . 3 . . 6 7 6 7 6 [x ] + [x ] 7 6 [x ] 7 6 1 n 2 n 7 6 1 n 7 A matrix is an m n4array of scalars:5 4 5  a11 a12 a1n    a21 a22 a2n A = 2 . . .   . 3 . . .. . 6 7 6 am1 am2 amn 7 6    7 m 4n 5 We use the notation A R  indicate that A is a m n matrix. Addition and scalar multiplication are2 de…ned the same way as for vectors. 3.BASICOPERATIONS 3

3. Basic operations You should already be faimilar with most of the basic operations on vectors and matricies listed in this section.

m n T n m 3.1. Transpose. Given a matrix A R  ; the transpose A R  is found by ‡ipping all terms along the diagonal.2 That is, if 2

a11 a12 a1n    a21 a22 a2n A = 2 . . .   . 3 . . .. . 6 7 6 am1 am2 amn 7 6    7 then 4 5 a11 a21 an1    a12 a22 an2 T A = 2 . . .   . 3 . . .. . 6 7 6 a1m a2m anm 7 6    7 Note that if the matrix is not4 square (m = n); then the5 “shape” of the matrix 6 changes. We can also use transpose on a vector x Rn by considering it to be an n by 1 matrix. In this case, xT is the 1 by n matrix2: T x = [x]1 [x]2 [x]n    3.2. Inner (dot) product. In three dimensional space, we are familiar with vectors as indicating direction. The inner product is an operation that allows us to tell if two vectors are pointing in a similar direction. We will use the notation x; y for inner product between x and y: In other courses, you may have seen this calledh i the with notation x y: The notation used here is more common  in signal processing and control systems. The inner product of x; y Rn is de…ned to be the sum of product of the elements 2 n

x; y = [x]i[y]i h i i=1 X = xT y Recall that if x and y are vectors, the angle between them can be found using the formula x; y cos  = h i x; x y; y h i h i Note that the inner product satis…es thep following rules (inherited from transpose) x + y; z = x; z + y; z h i h i h i y; z = y; z h i h i 3.3. Matrix-vector multiplication. Suppose we have an m n matrix A  a11 a12 a1n    a21 a22 a2n A = 2 . . .   . 3 . . .. . 6 7 6 am1 am2 amn 7 6    7 4 5 4 1.REVEWOFVECTORSANDMATRICIES and a length n vector x1: Note that the number of columns of A are the same as the length of x1: Multiplication of A and x1 is de…ned as follows:

x2 = Ax1 (3.1) where

[x2]1 = a11[x1]1 + a12[x1]2 + + a1n[x1]n (3.2a)    [x2]2 = a21[x1]1 + a22[x1]2 + + a2n[x1]n (3.2b)    . . (3.2c)

[x2]m = am1[x1]1 + am2[x1]2 + + amn[x1]n (3.2d)    Note that the result x2 is a length m vector (the number of rows of A). The notation (3.1) is a compact representation of the system of linear algebraic equations (3.2). Note that A de…nes a mapping from Rn to Rm: Thus, we can write A : Rn Rm: This mapping is linear. ! We can also consider a matrix to be a of vectors. For example, if we group the vectors x1; x2; ; xn into a matrix    M = x1 x2 xp    and de…ne the vector   1 2 a = 2 . 3 . 6 7 6 7 6 p 7 Then all linear combinations of x1; x2; 4 ; xp5are given by    y = Ma = 1x1 + 2x2 + + pxp    3.4. Matrix-. If matrix A : Rn Rm; and matrix B : Rm Rp; we can …nd mapping C: Rn Rp which is the! composition of A and B ! ! C = BA

[c]ij = [b]ik [a]kj k X That is, the i; j element of C is the dot product of the ith row of B with the jth column of A: The dimension of C is p n: This can also be though of as B mapping  a column of A at a time: That is, the …rst column of C; [c] 1 is B[a] 1;B times the …rst column of A: Clearly, two matricies can be multiplied only if they have compatible dimensions. Unlike scalars, the order of multiplication is important. If A and B are square matricies, AB = BA in general. The identity6 matrix 1 0 0    . 0 1 . I = 2 3 . . 6. .. 07 6 7 60 0 17 6    7 is a with ones along4 the diagonal. If size5 is important, we will denote it via a subscript, so that Im is the m m . The identity matrix is  3.BASICOPERATIONS 5 the multiplicative identity for matrix multiplication, in that AI = A and IA = A (where I has compatible dimensions with A):

3.5. Block Matricies. Matricies can also be de…ned in blocks, using other m n m p q n q p matricies. For example, suppose A R  B R  C R  and D R  : Then we can "block …ll" a (m + q) by2 (n + p) matrix2 X as2 2 AB X = CD   Often we will want to specify some blocks as zero. We will denote a block of zeros as simply 0: The dimension can be worked out from the other matricies. For example, if A 0 X = CD   The zero block must have the name number of rows as A and the same number of columns as D: Matrix multiplication of block matricies uses the same rules as regular matricies, except as applied to the blocks. Thus A B A B A A + B C A B + B D 1 1 2 2 = 1 2 1 2 1 2 1 2 C1 D1 C2 D2 C1A2 + D1C2 C1B2 + D1D2       3.6. . 1 1 If A is a scalar matrix, that is, A R  ; then the determinant of A is  just equal to itself. 2 n n For higher dimensions, the determinant is de…ned recursively. If A R   then 2 n

det(A) = [a]ij cij i=1 X n 1 n 1 where cij is the ijth cofactor, and is the determinant of a R  ma- trix (this is what makes the de…nition recursive) possibly times -1. In particular: i+j cij = ( 1) det(Mij) where Mij is the n 1 n 1 submatrix created by deleting the ith row and jth column from A 2 2 If A R  ; then det(A) = a11a22 a21a12 (Check that this matches the  de…nition)2

n n 3.7. Inverse. Given a square matrix A R  ; the inverse of A is the unique 1 2 matrix (when it exists) such that AA = I The inverse can be calculated as

1 T A 1 = C det(A)

T where [C]ij = cij; the ijth cofactor of A: C is also called the adjugate of A: The inverse exists whenever det(A) = 0: 6 6 1.REVEWOFVECTORSANDMATRICIES

4. Useful Properties of the Basic Operations Transpose T AT = A  (A + B)T = AT + BT   (AB)T = BT AT  Determinants det(AB) = det(BA) = det(A) det(B)  det(AT ) = det(A)  det(I) = 1  Determinants for Block Matricies Very useful: the determinant of block triangular matricies is the product  of the determininant of the diangonal blocks. In particular, if A and D are square AB A 0 det = det = det(A) det(D) 0 D CD     1 if A and D are square and D exists, then  AB 1 det = det(A BD C) det(D) CD   Inverse 1 1 AA = A A = I  T 1 1 T (A ) = (A )  1 1 1 (AB) = B A  Inverses for Block Matricies If A and D are square and invertible,  1 1 1 1 AB A A BD = 1 0 D 0 D     1 If A and D are square and D and  = A BD C invertible,  1 1 1 1 AB   BD = 1 1 1 1 1 1 DD D C D C BD + D     4.1. Exercises. 1 1 (1) Show that det(A ) = det(A) (2) Let I A N = m OIn   I 0 Q = m BIn   Im A P = BIn   (a) Explain why det(N) = det(Q) = 1 (b) Compute NP and QP (c) Show that det(NP ) = det(Im + AB) and det(QP ) = det(In + BA) 4.USEFULPROPERTIESOFTHEBASICOPERATIONS 7

(d) Show that det(Im + AB) = det(In + BA) 1 (3) Using the results of problem 2, explain why (Im + AB) exists if and only 1 if (In + BA) exists. Show by verifying the properties of the inverse that 1 1 (Im + AB) = Im A(In + BA) B: (That is, multiply the right hand side by Im + AB and show that you get the identity) (4) Verify the block inversion equations.

CHAPTER 2

Vector Spaces

1. Vector Space De…nition Definition 1. A vector space ( , ) consists of a set of elements , called vectors, and a …eld (such as the realF numbers)X which satisfy the followingX condi- tions: F

(1) To every pair of vectors x1 and x2 in ; there corresponds a vector x3 = X x1 + x2 in : X (2) Addition is commutative: x1 + x2 = x2 + x1 (3) Addition is associative: (x1 + x2) + x3 = x1 + (x2 + x3) (4) contains a vector, denoted 0, such that 0 + x = x for every x in (5) XTo every x in there is a vector x in such that x +x  = 0 X (6) To every in X , and every x in ; thereX corresponds a vector x in (7) Scalar multiplicationF is associative:X For any ; in and any x in X , ( x) = ( )x F X (8) Scalar multiplication is distributive with respect to vector addition: (x1 + x2) = x1 + x2 (9) Scalar multiplication is distributed with respect to scalar addition: ( + )x = x + x (10) For any x in ; 1x = x: X You can verify that Rn (or Cn) is a vector space. It is interesting to see that other mathematical objects also qualify to be vector spaces. For example:

Example 1. =Rn[s]; the set of all polynomials with real coe¢ cients with X degree less than n; = R; with addition and multiplication de…ned in the usual way: F n 1 n 2 if x1 = a1s + a2s + + an n 1 n 2    x2 = b1s + b2s + + bn; n 1    n 1 then x1 + x2 = (a1 + b1)s + (a2 + b2)s + + (an + bn) n 1 n 2    kx1 = ka1s + ka2s + + kan    We can show that this is a vector space by verifying that it satis…es the 10 conditions: n 1 n 2 n 1 n 2 (1) Given any x1 = a1s +a2s + +an and x2 = b1s +b2s +     n 1 n 1 + bn; we see that x1 + x2 = (a1 + b1)s + (a2 + b2)s + +       (an + bn) is indeed a polynomial of degree less than n, so x1 + x2 is in (2) obviousX from de…nition of addition (3) obvious from de…nition of addition (4) Select x = 0 as the zero vector

9 10 2.VECTORSPACES

n 1 n 2 n 1 n 2 (5) Given x = a1s +a2s + +an; select x = a1s a2s    an    n 1 n 2 n 1 (6) Given x = a1s + a2s + + an; we see that ax = aa1s + n 2    aa2s + + aan is a polynomial of degree less than n; so that ax is in    (7) obviousX from de…nition of scalar multiplication (8) obvious from de…nition of addition and scalar multiplication (9) obvious from de…nition of addition and scalar multiplication (10) select x = 1 as the unit vector

1.1. Exercises. (1) Show that = ; the set of all continuous functions is a vector space X C with = R; with addition and multiplication de…ned as x1 = f(t); x2 = F g(t); x1 + x2 = f(t) + g(t); ax1 = af(t): This can be shown to be a vector space in the same way as above. (2) Show that =Cn; the set of all n tuples of complex numbers is a vector X space with = C; the …eld of complex numbers F (3) Show that =Rn; = C is not a vector space. X F (4) Show that = x :•x +x _ + 1 = 0 is a vector space. with = R X f g F

2. Linear Independence and Basis 2.1. Linear Independence.

Definition 2. A of the vectors x1; x2; ; xp is a sum of    the form 1x1 + 2x2 + + pxp:    A linear combination can also be written in matrix-vector form,

1 2 x1 x2 xp 2 . 3    . 6 7   6 7 6 p 7 4 5 A vector x is said to be linearly dependent upon a set S of vectors if x can be expressed as a linear combination of vectors from S: A vector x is said to be linearly independent of S if it is not linearly dependent on S: A set of vectors is said to be a linearly independent set if each vector in the set is linearly independent of the remainder of the set. This de…nition immediately leads to the following tests:

Theorem 1. A set of vectors S = x1; x2; : : : ; xp are linearly dependent if f g there exists i with at least one = 0 such that 6 1x1 + 2x2 + + pxp = 0    Theorem 2. A set of vectors S = x1; x2; : : : ; xp is linearly independent if and only if f g

1x1 + 2x2 + + pxp = 0    implies i = 0 i = 1; 2; ; p:    2.LINEARINDEPENDENCEANDBASIS 11

Example 2. Consider the set of vectors 2 1 0 x1 = 4 ; x2 = 1 ; x3 = 1 263 223 213

This set is linearly dependent,4 for5 if we select4 5 1 = 4 15; 2 = 2 and 3 = 2; we have 1x1 + 2x2 + 3x3 = 0 Example 3. Consider the set of vectors 2 0 x = ; x = 1 4 2 0     This set is linearly dependent, for if we select 1 = 0 and 2 = 1; then

1x1 + 2x2 = 0 Note that the zero vector is linearly dependent on all other vectors. The maximal number of linearly independent vectors in a vectors space is an important characteristic of that vector space. Definition 3. The maximal number of linearly independent vectors in a vector space is called the dimension of the vector space Example 4. Show that the dimension of the vector space (R2; R) is 2 Note that the vectors 1 0 0 and 0 1 0 are linearly independent. Thus the dimension of ( 2; ) is greater than or equal to 2: Given three vectors x = R R     1 a b 0 ; x2 = c d 0 ; x3 = e f 0, Then we have     1x1 + 2x2+ 3x3 = 0 if 3 = 1; and 1 and 2 are solutions to the system of equations

1a + 2c = e 1b + 2d = f which always has at least one solution. Thus no set of three vectors are linearly in- dependent, and the dimension of (R2; R) is less than 3; implying that the dimension 2 of (R ; R) is 2.  2.2. Basis. Definition 4. A set of linearly independent vectors from a vector space ( ; ) is a basis for if every vector in can be expressed as a unique linear combinationF X of these vectors.X X It is a fact that in an n-dimensional vector space, any set of n linearly indepen- dent vectors quali…es as a basis. We have seen that there are many di¤erent math- ematical objects which qualify at vector spaces. However, all n-dimensional vectors spaces ( ; R) have a one to one correspondence with the vector space (Rn; R) once X a basis has been chosen. Suppose e1; e2; ; en is a basis for : Then for all x in    X X x = e1 e2 en    where = 1 2 n  0 and i are scalars. Thus the vector x can be    n identi…ed with the unique vector in R :Consider the vector space (R3[s]; R)   12 2.VECTORSPACES where R3[s] is the set of all real polynomials of degree less than 3. This vector 2 space has dimension 3, with one basis as e1 = 1; e2 = s; e3 = s : The vector x = 2s2 + 3s 1 can be written as 2 x = e1 e2 e3 3 2 1 3   4 5 So that the representation with respect to this basis is 2 3 1 0 : However, if we choose the basis e0 = 1; e0 = s 1; e0 = s2 s (verify that this set of vectors 1 2 3   is independent), x = 2s2 + 3s 1 = 4 + 5(s 1) + 2(s2 s) 2 0 0 0 = e1 e2 e3 5 2 4 3   4 5 so that the representation of x with respect to this basis is 2 5 4 0 :

2.3. Standard basis. For Rn; the standard basis are the unit vectors that point in the direction of each axis 1 0 0 0 1 . . i1 = 2.3 ; i2 = 2.3 ; in = 2 3 . .    0 6 7 6 7 6 7 607 607 617 6 7 6 7 6 7 2.4. Exercises. 4 5 4 5 4 5 (1) Find the dimension of the vector space given by all (real) linear combina- tions of 1 3 4 x1 = 2 ; x2 = 2 ; x3 = 4 ; 233 223 253 That is, 4 5 4 5 4 5

= x : x = 1x1 + 2x2 + 3x3; i R X f 2 g This is called the vector space spanned by x1; x2; x3 : (2) Show that the space of all solutions to thef di¤erentialg equation x• + 3x _ + x = 0 t 0  is a 2 dimensional vector space. (Verify the properties of a vector space)

3. Change of basis Since the vectors are made up of polynomials which are mathematical objects quite di¤erent from n tuples of numbers, the ideas of separation between vectors and their representations with respect to basis is fairly clear. This becomes more complicated when consider the “native” vector space (Rn; R): When n = 2; it is natural to visualize these vectors in the plane, as shown in Figure 1. In order to represent the vector x; we need to choose a basis. The most natural basis for (R2; R) is the array 1 0 i = i = 1 0 2 1     3.CHANGEOFBASIS 13

Figure 1. A two-dimensional real vector space

In this basis, we have the following representation for x : 2 2 x = = i i 3 1 2 3     Note that the vector and its representation look identical. However, if we choose a di¤erent basis, say 2 1 e = e = 1 1 2 2     then 2 1 x = = e e 3 3 1 2 4    3   1 4  so the representation of x in this basis is 3 3 We have seen that a vector x can have di¤erent representations for di¤erent basis. A natural desired operation would be to transform between one basis and an- other. Suppose a vector x has representations with respect to e1 e2 en    as and with respect to e0 e0 e0 as 0; so that 1 2    n  x = e1 e2 en = e0 e0 e0 0 (3.1)    1 2    n what is the relationship between and 0?The answer is most easily found by …nding the relationship between the bases themselves. Each basis vector has a representation in the other basis. That is, there exists pi such that

ei = e0 e0 e0 pi 1 2    n   14 2.VECTORSPACES

If we group the vectors ei into a matrix, we can write

e1 e2 en = e0 e0 e0 p1 p2 pn    1 2    n    p11 p12 p1n          p21 p22 p2n 0 0 0 2    3 = e1 e2 en . . .    . . . 6 7   6 pn1 pn2 pnn 7 6    7 = e0 e0 e0 4P 5(3.2) 1 2    n where we see that the matrix P takes the vectors pi as its columns. Substituting (3.2) into (3.1), we get

e0 e0 e0 P = e0 e0 e0 0 1 2    n 1 2    n Since the representation of a vector with respect to its basis is unique, we must have 0 = P : Thus, in order to transform from basis 1 ( e1 e2 en ) to    basis 2 ( e0 e0 e0 ), we must form the matrix P; where 1 2    n   ith column: the representation of 2 3 P = basis 1 vector i (ei) 6 with respect to basis 2 7 6 7 6 ( e0 e0 e0 ) 7 6 1 2    n 7 4 5 1 It turns out that P will always be an , so that = P 0; and we must have ith column: the representation of 1 2 0 3 P = Q = basis 2 vector i (ei) 6 with respect to basis 1 7 6 7 6 ( e1 e2 en ) 7 6    7 4 5  4. Norms, Dot Products, Orthonormal Basis 4.1. Vector Norms. A vector norm, denoted x ; is a real valued function of x which is measure of its length. You are probably alreadyk k familar with a common norm de…ned by the Euclidean length of a vector, but in fact, there are many possibilities. A valid norm satis…es the following properties: (1) (Always positive unless x = 0) x 0 for every x and x = 0 imples x = 0 k k  k k (2) (homogeneity) x = x for scalar : k k j j k k (3) (Triangle inequality) x1 + x2 x1 + x2 k k  k k k k The most common vector norms are the following: 4.1.1. 1-norm. The 1-norm is the sum of the absolute value of the elements of x:

n x := [x] k k1 j ij i=1 X 4.NORMS,DOTPRODUCTS,ORTHONORMALBASIS 15

4.1.2. 2-norm. The 2-norm corresponds to Euclidean distance, and is the square root of the sum of squares of the elements of x:

n x := ([x] )2 k k2 v i ui=1 uX t Note that the sum of squares of the elements can also be written as xT x: Thus x = pxT x k k2 4.1.3. -norm. The -norm is simply the largest component of x 1 1 x = max [x]i k k1 i 4.2. Dot products, and projection. As discussed earlier, the dot product between two vectors is given by n

x; y = [x]i[y]i h i i=1 X = xT y Note that x = x; x k k2 h i If two vectors have a dot product of zero,p then they are said to be orthogonal. A set of vectors xi which are pairwise orthogonal and unit 2-norm are said to be orthonormal andf g will satisfy

T 1 i = j xi; xj = x xj = h i i 0 i = j  6 The projection of one vector (say x) on another (say y) is given by x; y z = h iy y 2 k k The vector z points in the same direction as y; but the length is chosen so that the di¤erence between z and x is orthogonal to y:

x; y z x; y = h 2iy x; y h i * y + k k x; y = h i y; y x; y y 2 h i h i k k = 0 since y; y = y 2 h i k k 4.3. Orthonormal Basis - Gram-Schmidt Proceedure. An orthonormal basis is a vector space basis which is also orthonormal. Operations are often much easier when vectors are de…ned using an orthonormal basis. The gram-schimidt proceedure can be used to transform a general basis into an orthonormal basis. It does so by building up the orthonormal basis one vector at a time. 16 2.VECTORSPACES

Suppose we had a basis of two vectors e1; e2 : We can make an orthonormal f g basis as follows: Set the …rst basis vector to point in the same direction as e1; but with unit length: e1 q1 = : e1 k k We need to pick a section vector which is orthogonal to q1; but spans the same space as e1; e2 : This can be done by subtracting the part of e2 which points in f g the same direction as q1: Let

u2 = e2 q1; e2 q1 h i Then

q1; u2 = q1; e2 q1; e2 q1 h i h h i i = q1; e2 q1; q1; e2 q1 h i h h i i = q1; e2 q1; e2 q1; q1 h i h i h i = q1; e2 q1; e2 h i h i = 0 since q1; q1 = 1: Thus u2 is orthogonal to q1: We can get an orthonormal set by h i e2 letting q2 = : Yet, e2 k k 1 q1;e2 h i q q = e e e1 u2 1 2 1 2 k 0 k k1 k " u2 #     k k which is clearly an invertible change of basis. The general proceedure is as follows. Let e1; ; en be a basis. Let f    g u1 u1 = e1 q1 = u1 ku2k u2 = e2 q1; e2 q1 q2 = u2 . h i . k k . . n 1 un un = en qk; en qk qn = h i un k=1 k k X The orthonormal basis given by q1; ; qn spans the same space as e1; ; en f    g f    g 5. QR Decomposition The gram-schmidt proceedure can be viewed as a matrix decomposition. Let

E = e1 e2 en    be a matrix with columns made up of n independent vectors ei: Then the relation- ship between the orthonormal vectors q obtained via the gram-schmit proceedure and the original vectors can be written as

u1 q1; e2 q1; en k k h i    h i . 2 0 u2 q2; e2 . 3 e1 e2 en = q1 q2 q3 q4 k k h i    . .. 6 . 0 . qn 1; en 7     6 h i7 6 0 0 un 7 6    k k 7 or 4 5 E = QR 5.QRDECOMPOSITION 17 where Q is a matrix with orthonormal columns, and R is an upper . Since Q has orthonormal columns, you can verify that QQT = I; implying that 1 T Q = Q : A matrix whose transpose is also its inverse is called and orthonormal matrix, and satis…es QQT = QT Q = I (so its rows are also orthonormal.) It turns out that the gram-schimdt proceedure as described in the last section is not very well conditioned, numerically, meaning that small errors will accumulate as the progresses. However, much more numerically stable are available using Householder or Givens transformations. We will examine the former, but both are covered in detail in textbooks on , such as Golub G. H and C. F Van Loan, Matrix Computations, John Hopkins Press, 1989. Consider the following problem: we have a vector x; and we would like to …nd an orthonormal matrix P such that

0 P x = 2 . 3 = i1 . 6 7 607 6 7 4 5 where is an arbitrary number. It turns out that a matrix of the form

P = I 2vvT will do the job, where v is restricted to be unit length ( v = 1): First, lets check that P is indeed orthonormal for any v : k k

T PP T = I 2vvT I 2vvT = I 2vvT I 2vvT   = I 4vvT + 4(vvT vvT )   = I 4vvT + 4(v v 2 vT ) k k = I 4vvT + 4vvT = I

Where we have used the fact that v = 1: Now, lets see if we can indeedk pickk an appropriate v:

P x = I 2vvT x = x 2v(vT x)  18 2.VECTORSPACES

x+ x i1 Let’spick v = k k : Then x+ x i1 k k k k 2 T P x = x 2 (x + x i1) ((x + x i1) x) x + x i1 k k k k k k k k 2 2 T = x 2 (x + x i1)( x + x x i1) x + x i1 k k k k k k k k k k 2 1 2 2 T = 2 x + x i1 x x + x x i1 x x + x i1 2 k k k k k k k k k k k k    3 2 T x + x x i1 i1 k k k k    2 1 2 T 2 2 2 T = 2 x + 2 x x i1 + x i1 x x + x x i1 x x + x i1 2 k k k k k k k k k k k k k k k k      3 2 T x + x x i1 i1 k k k k    note that i1 = 1; thus k k 2 2 T 2 T P x = 2 x + x x i1 x x + x x i1 x x + x i1 k k k k k k k k k k k k     3 2 T x + x x i1 i1 k k k k 3 2 T   2 x + x x i1 k k k k = 2 i1  x + x i1  k k k k 3 2 T 2( x + x x i1) so that the desired transformation occurs with = k k k k 2 : x+ x i1 k k k k x x i1 You can verify in a similar manner that another possible choice for v is k k : x x i1 k k k k In practice, one would choose the v for which x x i1 is largest, to avoid di- viding by a small number. k  k k k Now, and QR decomposition can be accomplished as follows. n n (1) Given E R  2 E = e1 e2 en    e1 e1 i1 T pick v1 = k k : Apply P1 = I 2v1v to get e2 e2 i1 1 k k k k

P1E =    0 e2    e2  2 n  2 where is an arbitrary number, 0 is a vector zero of length n 1 and ei are arbitrary vectors of length n 1: 2 2 e e i1 1 0 2 2 (2) Pick v2 = 2 k 2k and apply P2 = T to get e2 e2 i1 0 I 2v2v k k k k  2      P2P1E = 0 2  3    2 3 0 0 e3 en 4 5 Note that because of the way we chose P2; the …rst column of P1E remains the same, and we zero out the correct parts of the second column. (3) Continue in this manner, until with P = PnPn 1 P1; we get PE = R; where R is an upper diagonal matrix. Then with Q  = P T ;E = QR: 5.QRDECOMPOSITION 19

The keys to are that at each step, the modi…cation of E involes an orthonormal matrix, and speci…cation of this orthonormal matrix is well conditioned when v is away from zero. k k

CHAPTER 3

Projection Theorem

The close connection between inner product and the 2-norm comes into play in vector minimization problems that involve the 2-norm. Suppose we have a matrix A; and a vector y: We would like to …nd the vector x which gets mapped through A to a vector which is as close as possible to y: That is, we have the folowing problem: min Ax y (0.1) x k k2 Useful facts: (1) When A is a matrix, there is always a solution to this minimization prob- lem. (2) When A is an arbitrary linear operator, there is always a solution to this minimization problem if the image, (or range space) of A is closed. (3) The solution can be found using dot products. Let’stry to understand the minimization problem. Theorem 3. (Projection Theorem) x is a minimizer of (0.1) if and only if y Ax; Ax = 0 for all x: h i Proof. (if) Suppose x satis…es y Ax; Ax = 0: Let x be another vector in X: h i 2 2 Ax y = A(x + x x) y k k k k 2 = Ax y + A(x x) k 2 k 2 = Ax y + 2 Ax y; A(x x) + A(x x) k k h i k k Now, x x is a vector in X; so that Ax y; A(x x) = 0; and h i 2 2 2 Ax y = Ax y + A(x x) k k k k k k since A(x x) 0; Ax y Ax y : Thus x is a minimizer (onlyk if) Now,k  supposek x^doesk  not k satisfy k y Ax;^ Ax = 0 for some x X; h i 2 e.g. y Ax;^ Axd = c: Then h i 2 A(^x + xd) y = Ax^ y + 2 Ax^ y; Axd + xd k k k k h i k k = that is is a minimizer 

21

CHAPTER 4

Matrices and Linear Mappings

A matrix is an m n array of scalars that represents a from one vector space to another: If we group the vectors x1; x2; ; xn into a matrix    M = x1 x2 xp    and de…ne the vector   1 2 a = 2 . 3 . 6 7 6 7 6 p 7 Then all linear combinations of x1; x2; 4 ; xp5are given by    y = Ma = 1x1 + 2x2 + + pxp    The space p S = y : y = Ma a R f 2 g is called the range space of M

Definition 5. The range space (or just range) of M is the subset of Rn to which vectors are mapped by M, that is m R(M) = y : y = Mx; x R f 2 g In matrix notation, we can say that the column vectors of M are linearly independent if and only if Ma = 0 implies a = 0: If a matrix does not consist of linearly independent column vectors, then there exists a set of a such that Ma = 0: This set is a subspace of Rp; and is called a null space. Definition 6. The null space of M is the subset of Rm which is mapped to the zero vector, that is m N(M) = x : 0 = Mx; x R f 2 g The dimension of the range space is called the of a matrix. Theorem 4. The rank of M is given by the maximal number of linearily inde- pendent columns (or rows) 0.0.1. MATLAB. MATLAB has commands to …nd the range space, null space and rank of a matrix. Consider the matrix

Orth, null and rank 0.0.2. Exercises.

23 24 4.MATRICESANDLINEARMAPPINGS

1. Solutions to Systems of Linear Equations Given vector spaces X and Y and linear operator A : X Y: Given equation ! Ax = y (1.1) (1) Determine if there is a solution for a particular y (2) Determine if there is a solution for every y (3) Determine if the solution is unique (4) If the solution exists, …nd it. 1.0.3. Existence and uniqueness of a solution. y in range space of A: null space of A: 1.0.4. Finding the solution Case 1: A invertible. De…nition of the inverse of A.

1.0.5. Finding the solution Case 2: A not invertible. CHAPTER 5

Square Matrices, Eigenvalues and Eigenvectors

n n If A is a square matrix, i.e. A R  ; then A maps vectors back to the same space. These matrices can be characterized2 by their eigenvalues and eigenvectors

n n n n Definition 7. Given matrix A R  ; (or C  ) if there exists scalar  and vector x = 0 such that 2 6 Ax = x then  is an eigenvalue of A; and x is an eigenvector of A: To …nd eigenvalues and eigenvectors, we can use the concept of rank. If  is an eigenvalue of A; then there exists x such that Ax x = 0 or (A I)x = 0 From above, an x = 0 only exists if the rank of (A I) is less than n: Recall that the rank of a square6 matrix is less than n if and only if the determinant of the matrix is zero. This implies that the eigenvalues of A satisfy det(A I) = 0 Example 5. Let 1 4 A = 1 1   Then 1  4 A I = 1 1    det(A I) = (1 )(1 ) 4 = 2 2 3 = ( 3)( + 1) Thus the eigenvalues are 3 and 1: For  = 1; we …nd the corresponding eigenvector via (A + I) x = 0 2 4 x 1 = 0 1 2 x2     which has solution x = 2 1 : Note that there are many possible solutions, each can be obtained through a scale factor.   25 26 5. SQUARE MATRICES, EIGENVALUES AND EIGENVECTORS

For  = 3; we …nd the corresponding eigenvector via (A 3I) x = 0 2 4 x1 = 0 1 2 x2     which has solution x = 2 1 :   1. Matrix Exponential n n Given A R  ; we de…ne the matrix exponential as follows: 2 A2 A3 eA : = I + A + + + 2! 3!    Ak : = 1 k=0 k! Using the de…nition, you can verifyX the following properties: e0 = I; where 0 is a matrix of zeros.  A 1 A e = e  d eAt = AeAt = eAtA  dt  For the last property, note that d (At)k kAktk 1 1 = 1 dt k=0 k! k=1 k! X X (` + 1)A`+1t` = 1 `=0 (` + 1)! X A`t` = A 1 `=0 `! = AeXAt and similarly for eAtA:

2. Other Matrix Functions Note that the de…nition of the matrix exponential corresponded with the usual scalar de…nition, that is x2 ex = 1 + x + + 2!    A2 eA = I + A + + 2!    In general, a matrix function is de…ned using its expansion. In particular, we de…ne the matrix natural log (A I)2 (A I)3 (A I)4 ln(A) = (A I) + + 2 3 4    It turns out that for a given matrix, the value of the matrix function can be found using a …nite expansion as well. The key result is called the Caley-Hamilton The- orem, which will be stated without proof: 2.OTHERMATRIXFUNCTIONS 27

Theorem 5. (Caley-Hamilton) Given an n-dimensional square matrix A; let n n 1 a() = det(A I) =  + 1 + + n = 0 be the characteristic equation. Then a(A) = 0; that is    n n 1 A + 1A + + nI = 0    ( A satis…es its own characteristic equation) In particular, this indicates that n n 1 A = 1A nI n+1 n  n  1 A = 1A 2A nA n 1    n 1 = 1 1A nI 2A nA       2 n 1 = 2 A nI 1     and in general, Ak for k > n can be written as a linear combination of the terms I; 2 n 1 A; A ; A : Thus for a given A; and a given matrix function f( ); there is an n 1 dimensional   polynomial representation of f(A); that is  n 1 n 2 f(A) = A + A + + I 1 2    n Although we will not prove this in detail, since both A and i have the property that a(i) = a(A) = 0; and because a( ) is what is used to simplify a(A); we can use the function values at the eigenvalues to more easily evaluate the matrix function. Given n dimensional matrix A and function f( ); to evaluate f(A) when the eigenvalues of A are simple (not repeated), perform the following steps:

Find the eigenvalues of A; that is, 1; ; n  Solve the following equations for through    1 n n 1 n 2 f(1) = 11 + 21 + + n n 1 n 2    f(2) =  +  + + 1 2 2 2    n . . n 1 n 2 f(n) =  +  + + 1 n 2 n    n n 1 n 2 Find f(A) = A + A + + I  1 2    n When the eigenvalues are repeated, there will not be enough equations to solve for i; so the additional conditions are added

d d n 1 n 2 f(i) =  +  + + d d 1 2 2 2    n .  . m m d d n 1 n 2 f(i) =  +  + + dm dm 1 2 2 2    n where m is the index of the repeated eigenvalue. 

p3 1 2 2 Example 6. Find ln 1 p3 : The characteristic equation is " #! 2 2 2 p3 1 p 2  2 3 1 det 1 p3 =  + " #! 2 ! 4 2 2 = 2 p3 + 1 28 5. SQUARE MATRICES, EIGENVALUES AND EIGENVECTORS

p3 3 p3 1 j  and  = 1 = j = e 6 : Now 2  4 2  2 q j   ln e 6 = j + k2 k = 1; 0; 1; 6       j   ln e 6  = j + k2 k = 1; 0; 1; 6       Note that there are multiple solutions. This is because the function ln is not one to one. The solution of interest depends on the context. For now, let’s pick the solutions with k = 0: Then  j  j = e 6 + 6 1 0  j  j = e 6 + 6 1 0 or j  p3 + 1 j 1 6 = 2 2 1 j  p3 1 6 " j 1# 0   2 2   This has solution 1 p3 + 1 j 1 j  1 = 2 2 6 p3 1 j  0 " j 1# 6   2 2   1  = 3 2p3   12  and 1 p3 ln(A) = A I 3 6 p3 p3   = 6 6 6 2   p3 p3 3 6 6 6 4 0    5 = 6  0  6  For an example with repeated roots, see Example B.2 in the text. Note that, because ln(ex) = x; ln(eA) = A; and the matrix log is the inverse function for the matrix exponential. APPENDIX A

Appendix A

29