Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 1

Week 4: Differential & Notes After , the most common topic on the math subject GRE is linear algebra, foll- wed by differential equations. Since the latter is a more natural continuation of calculus (and since it will take much less time), I cover it first. Differential Equations

Definition 1 (Differential ). An ordinary differential equation is an equation of the form F (x, y, y0, y00, . . . , y(n)) = 0 for some F : Rn+2 → R where y(k) denotes the kth derivative of y with respect to the independent x. The order of the differential equation is equal to the highest order derivative of u which appears; the above equation is nth order (assuming F does actually depend on its last variable). Note, we often replace y with u which can be thought to stand for “unknown.” The goal then, is to find the unknown function which satisfies the equation.

On the math subject GRE, questions are mostly limited to equations where the highest order derivative of the unknown function can be isolated; that is, equations of the form y(n) = f(x, y0, . . . , y(n−1)). Many differential equations can be solved by simply recalling facts from calculus.

Example 2. Solve the differential equations (a) u0 = u, (b) y00 = −4y. Solution. For (a), we can phrase the differential equation as a question: “What function will remain the same upon differentiating?” We recall from calculus that u(x) = ex is the function which is invariant under differentiation. Indeed, this a solution to the equation in (a). For (b), asking a similar question, we reason that y should be a function which, upon twice differentiating, returns the negative of itself (with a constant multiple). You may recall from calculus that the function sin(x) returns its negative after twice differentiating and then to account for the constant multiple, you can see that y(x) = sin(2x) is a solution.

A natural question would be to ask if we have found all solutions to a given equation. Of course, in the above example, we did not. For (a), any constant times ex would work just as well so a full family of solutions would be u(x) = Cex for some arbitrary constant C ∈ R. For (b), besides simply mutiplying by a constant, we see y(x) = cos(2x) works as well. Indeed, combinations of the two also work so the full family of solutions is give by y(x) = A sin(2x) + B cos(2x) for some arbitrary constants A, B ∈ R. Both these equations have special structure which is defined here. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 2

Definition 3 (Linearity). A differential equation

F (x, y, y0, . . . , y(n)) = 0 said to be linear if the function F is linear in the arguments (y, y0, . . . , y(n)); that is,

F (x, αy + z, αy0 + z0, . . . , αy(n) + z(n)) = αF (x, y, y0, . . . , y(n)) + F (x, z, z0, . . . , z(n)) for all n times differentiable y, z : R → R and all α ∈ R.

In the terminology of linear algebra, a differential equation is linear if its solution forms a (possibly affine) subspace of the vector space of continuous functions on R. Intuitively, a differential equation is linear if the unknown function and its derivatives do not appear in any non-linear way. In symbols, a linear nth order differential equation has the form

(n) 00 0 an(x)y + ··· + a2(x)y + a1(x)y + a0(x)y = f(x)

for some functions ak, f : R → R (k = 0, . . . , n).

Typically, we also require that an(x) 6= 0 for any x, so that we may divide by an(x) and (n) eliminate the coefficient of y . If an(x) = 0 at some x, this x is called a singular point of the differential equation. Solving differential equations with singular points can be quite difficult and is beyond the scope of these notes.

Linear equations are always solvable analytically. We build toward the general solution of a first-order in a few steps.

Definition 4 (Separability). A first-order differential equation is called separable if it is of the form f(x) y0 = . g(y) That is, separable equations have the variables on the right hand side separated into two different functions.

Proposition 5 (Solution to Separable Equations). The solution to

f(x) y0 = g(y) is given by the pair of Z Z g(y)dy = f(x)dx.

Formally, we can arrive at this by manipulating differentials: dy f(x) Z Z = =⇒ g(y)dy = f(x)dx =⇒ g(y)dy = f(x)dx. dx g(y) Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 3

Of course, this actually depends on the chain rule.

Example 6. Solve the differential equation y0 = (1 + y2)(4x3 + 2x). Solution. We see that Z dy Z = (4x3 + 2x2)dx =⇒ arctan y = x4 + x2 + C =⇒ y = tan(x4 + x2 + C). 1 + y2 It can be readily checked that this function does satisfy the differential equation.

From this, we can easily solve any equation of the form y0 = p(x)y. We see that such an equation has the solution R y(x) = e p(x)dx. This observation gives us a method for solving any first order linear equation.

Proposition 7 (Method of Integrating Factor). The solution to the first-order linear equation y0 + p(x)y = q(x) is given by R Z R  y(x) = e− p(x)dx e p(x)dxq(x)dx .

This formula deserves some motivation. In with separable equations (or even simpler equations of the form y0(x) = f(x)), we see that solving a differential equation is akin to “integrating the equation.” With this in mind, it would be ideal if the left hand side of the above equation was a perfect derivative so we could just integrate it away. In general, there will not be Y (x) such that Y 0(x) = y0(x) + p(x)y; i.e., the left hand side will not be a perfect derivative. However, we can fix that by cleverly multiplying by some other function µ(x) which we call an integrating factor. Indeed, our new equation will read µ(x)y0(x) + µ(x)p(x)y(x) = µ(x)q(x). Now, if µ0 = p(x)µ, then the left hand side will read µ(x)y0(x) + µ0(x)y(x) which is the derivative of µ(x)y(x). Thus we simply choose µ satisfying µ0 = p(x)µ. However, R from the above, we know that µ(x) = e p(x)dx satisfies that equation. Thus we see,

R R R d h R i R e p(x)dxy0 + p(x)e p(x)dxy = e p(x)dxq(x) =⇒ e p(x)dxy(x) = e p(x)dxq(x) dx Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 4 whence integrating gives

R Z R R Z R  e p(x)dxy(x) = e p(x)dxq(x)dx =⇒ y(x) = e− p(x)dx e p(x)dxq(x)dx

which is the solution listed above.

Example 8. Solve the equation 1 y0(x) + y(x) = cos(x2). x

R Solution. According the formula, we should multiply the equation by µ(x) = e (1/x)dx = eln(x) = x. Doing this gives d xy0(x) + y(x) = x cos(x2) =⇒ (xy(x)) = x cos(x2). dx Integrating yields sin(x2) sin(x2) C xy(x) = + C =⇒ y(x) = + 2 2x x where C is an arbitrary constant.

It may irk you that all of these solutions have this lingering arbitrary constant C. This has something to do with the fact that differentiation eliminates constants so when we solve the equation (think of “integrating the equation”) we need to reintroduce constants that may have been eliminated. Often times, along with a differential equation, a given value y(x0) = y0 is specified. This value can be used to solve for the constant C. For example,√ along with the above equation, the question may have specified√ that y should satisfy y( π) = 1. We found our general solution, so substituting x = π gives √ sin(π) C y( π) = √ + √ . 2 π π We are told this value should be 1, so this gives C √ √ = 1 =⇒ C = π. π √ Thus the function y which solves the equation and satisfies y( π) = 1 is given by √ sin(x2) π y(x) = + . 2x x Such questions which give an equation and specify a value for the unknown function at a point are called Initial Value Problems or Boundary Value Problems depending on the con- text (and the way in which the values of the unknown function are specified). We won’t Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 5 address this point any further.

This gives a solution to all linear first order equations. With separation of variables we can also solve some non-linear equations. Most non-linear equations will not be solvable an- alytically. However, without solving we can still determine some of the behavior for solutions to certain equations.

Definition 9 (Autonomous Equations). A first-order differential equation is called autonomous if it is of the form y0 = f(y). Note that all autonomous equations are separable and thus can, in principle, be solved by integration. However, that integration could be too difficult to be feasibly performed. Even so, we can use the equation to decide what potential solutions might look like.

Definition 10 (Equilibrium Solutions). Consider the first order autonomous equation y0 = f(y).

Suppose that y0 ∈ R is such that f(y0) = 0. Then y(x) = y0 is an equilibrium solution to the above equation.

It is not difficult to see that if we specify y(x0) = y0 for some x0 where y0 is root of f, then y(x) = y0 for all x > x0; that is, if y hits an equilibrium point, it will stay on that equilibrium point forever. By a standard uniqueness argument (so long as f is differentiable), this shows that no solution could cross an equilibrium solution. That is, if y0 is an equilibrium point and y(x0) < y0, then y(x) < y0 for all x > x0. A general rull of thumb for autonomous equations is that all solutions either (1) blow up to positive or negative infinity as x → ±∞ or (2) tend toward an equilibrium solution as x → ±∞. The function f and a specified value will tell you which regime your solution falls into. We demonstrate this in an example before moving on.

Example 11. Draw a solution to y0 = (y − 1)(4 − y) such that (a) y(0) = 1, (b) y(0) = 3, (c) y(0) = 5. Solution. The general procedure is to identify what sign f takes on either side of each equilibrium point; this will determine whether y increases or decreases in that region. This will also tell you whether an equilibrium point is sta- ble or unstable. Note that all solutions tend to an equi- librium point here. The point y0 = 4 is a stable equi- librium point while y0 = 1 is an unstable equilibrium point. Figure 1: Example 10 solu- tion Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 6

We move on to higher-order linear, constant coefficient equations. Note that anything from this section works just as well for first-order equations though it is unnecessary since we have the integrating factor method for such equations.

We first consider equations of the form

(n) (n−1) 0 any + an−1y + ··· + a1y + a0y = 0 (1)

where a0, a1, . . . , an ∈ R, an 6= 0. There is one strategy that always works for such equations up to yor ability to find roots of . Intuitively, we could think that we need to find solutions which don’t change much upon differentiation since the desired solution seemingly will only be multiplied by constants when differentiated. Thus we look for a solution of the form y = erx where r is a constant. Plugging this in, we arrive at

n n−1 anr + an−1r + ··· + a1r + a0 = 0.

This is satisfied if r is a root of the above . The polynomial has n roots up to multiplicity, and we can then superpose these to construct the solution.

Example 12. Find the general solution to the differential equation y00 − 2y0 − 3y = 0.

Solution. Substituting in y = erx, we arrive at

r2 − 2r − 3 = 0 =⇒ (r + 1)(r − 3) = 0 =⇒ r = −1, 3.

This shows that both e−x and e3x are solutions. Thus by linearity, the general solution is

−x 3x y(x) = C1e + C2e .

This method raises an immediate red flag: what if the polynomial in r does not have real roots? This is easily handled using Euler’s identity.

Example 13. Find the general solution of y00 + 4y0 + 13y = 0.

Solution. Again, substituting y = erx. we arrive at

r2 + 4r + 13 = 0 =⇒ (r + 2)2 + 9 = 0 =⇒ r = −2 ± 3i.

This would give solutions of the form

(−2+3i)x (−2−3i)x y(x) = C1e + C2e .

By Euler’s identity, we see this is equivalent to

−2x y(x) = e (C1 cos(3x) + C2 sin(3x)).

You may think the final constants C1 and C2 must be allowed to be complex now. This is sort of true however if real initial conditions are specified, it will always turn out that Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 7

C1,C2 ∈ R.

There is a slightly more subtle hiccup that one can encounter when using this method. Consider the equation y00 − 2y0 + y = 0. Using this method, we would find that r2 − 2r + 1 = 0 =⇒ (r − 1)2 = 0 =⇒ r = 1. This yields the solution y(x) = Cex. However, you may have noticed that with all the previous second-order equations, there were two solutions with two arbitrary constants (i.e., the solution space was a two-dimensional vector space). This is true generally: the solution space to an nth order linear equation where the right hand side is zero will form an n-dimensional vector space. So for this latter equation we are missing a solution. This solution can be recovered using a method called variation of but what you will find is that in this case, the general solution is not a constant multiple of ex, but rather a linear multiple of ex:

x y(x) = (C1 + C2x)e .

This generalizes upwards for higher-order roots.

This method only deals with homogeneous equations like (1); equations which have zero on the right hand side. There are also methods for non-homogeneous equations; those of the form (n) (n−1) 0 any + an−1y + ··· + a1y + a0y = f(x) for some given function f. These methods cen be very complicated even in relatively simple cases (low order, “nice” functions). However, if f has a special form then they can be very easy. We state this in a couple propositions and finish with a brief example.

Proposition 14 (Solutions of Non-Homogeneous Equations). Any solution to the equation (n) (n−1) 0 any + an−1y + ··· + a1y + a0y = f(x)

takes the form y = yp + yh where

(n) (n−1) 0 anyp + an−1yp + ··· + a1yp + a0yp = f(x)

and yh is the general solution of

(n) (n−1) 0 anyh + an−1yh + ··· + a1yh + a0yh = 0.

The function yp is called the particular solution and the function yh is called the homoeneous solution. Thus this proposition tells us that we can always break a solution into a particular and homogeneous part.

Proposition 15 (Method of Undetermined Coefficients). Consider the equation

(n) (n−1) 0 any + an−1y + ··· + a1y + a0y = f(x).

Suppose that f is not in the span of the homogeneous solutions for the equation. Then Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 8

(a) if f(x) = α sin(λx) + β sin(λx) where λ 6= 0 and at least one of α, β is non-zero, then yp(x) = A sin(λx) + B cos(λx) for some constants A, B,

λx λx (b) if f(x) = αe for α, λ 6= 0, then yp(x) = Ae for some constant A,

m m (c) if f(x) = αmx + ··· + α1x + α0 where m ∈ N and αm 6= 0, then yp(x) = Amx + ··· + A1x + A0 for some constants A0,A1,...Am. This gives us a way of solving non-homogeneous equations if the forcing functions are of a certain form.

Example 16. Find the general solution of the equation y00 + 25y = x2.

Solution. We must find the general homogeneous solution and then find a single particular rx solution. For the homogeneous solution, we try yh = e . This will be a homogeneous solution if r2 + 25 = 0 =⇒ r = ±5i.

This gives a homogeneous solution: yh(x) = C1 cos(5x) + C2 sin(5x). By the above proposi- 2 tion, it suffices to look for a particular solution of the form yp(x) = Ax +Bx+C. Substituting this into the equation gives

2A + 25Ax2 + 25Bx + 25C = x2.

Matching the coefficients on each power of x gives A = 1/25,B = 0,C = −2/625. Thus the general solution is given by

x2 2 y(x) = y (x) + y (x) = − + C cos(5x) + C sin(5x). p h 25 625 1 2 Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 9

Linear Algebra

The most basic goal of linear algebra is to solve systems of linear equations. Since these linear systems can be expressed in vector form, it is worthwhile to build up some theory surrounding matrices and vectors; we will do that momentarily. First we review how to solve these systems. The most general techinique for solving these equations is known as Gauss elimination (also called row reduction) with back substituion wherein we simply eliminate variables by tactfully combining the equations and then solve the equations in reverse order. We demonstrate these sort of computations in a few examples and then move on to theoretical aspects of linear algebra.

Example 17. Solve for (x, y, z) in the following system of equations: x + y + z = 5 x + 2y − z = 9 −2x + 4y + z = −3 Solution. In the above, we could subtract the first equation from the second equation and add twice the first equation to the third equation to arrive at x + y + z = 5 y − 2z = 4 6y + 5z = 7 Now we can subtract 6 times the second equation from the third and we will have x + y + z = 5 y − 2z = 4 17z = −17 Now the last equation depends only on z and can easily be solved: z = −1. Substituting thie value into the second equation, we find y = 2 and using both these values in the first equation, we have x = 4. Thus our solution is (x, y, z) = (4, 2, −1). Of course, we can write such a system in matrix-vector form as  1 1 1  x  5   1 2 −1 y =  9  −2 4 3 z −3 or more succinctly in augmented matrix form where we can perform the same elimination steps as before. This is given by  1 1 1 5  1 1 1 5  1 2 −1 9  −→ 0 1 −2 4 −2 4 3 −3 0 6 5 7 1 1 1 5  −→ 0 1 −2 4  0 0 17 −17 Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 10

where it is understood that each column represents a variable and the entries to the right of the divider represent the values on the right hand side of the equation.

Many questions on the math subject GRE boil down to solving linear systems but there are some intricacies. Above we found a unique solution to our system of equations. This may not always happen. We demonstrate this with two more brief examples before moving to theory.

Example 18. Solve the linear systems

x + y + z = 5 x + 2y − z = 9 3x + 5y − z = 23 and

x + y + z = 5 x + 2y − z = 9 −2x − 5y + 4z = 10

Solution. For the first, we row reduce to see 1 1 1 5  1 1 1 5 1 2 −1 9  −→ 0 1 −2 4 3 5 −1 23 0 2 −4 8 1 1 1 5 −→ 0 1 −2 4 0 0 0 0 Re-writing this in equation form, the third equation is 0x+0y+0z = 0 which is automatically satisfied for any (x, y, z). This shows that we really only have two equations while there are three variables. Thus the best we can do is solve for two variables in terms of the third. Indeed, letting z = t for some arbitrary t ∈ R, we find that y = 4 + 2t and x = 1 − 3t. Thus the solutions are given by the set of vectors (x, y, z) = (1 − 3t, 4 + 2t, t) where t ∈ R. In this case, since z is unconstrained, it is called a free variable. For the second system above if we subtract three times the second equation from the first, we arrive at −2x − 5y + 4z = −22. This directly contradicts the third equation and so this system is inconsistent; there are no solutions.

With these three examples we have witnessed three different scenarios. The first system admitted a unique solution, the second had infinitely many and the third had zero. It turns out these are the only three possibilities. We see this shortly.

Now we move onto some theory. The concepts of greatest concern in undergraduate linear algebra (matrix algebra and the structure of Rn) are specific cases of more general concepts which we cover first though we will soon see that it is enough to study Rn. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 11

Definition 19 (Vector Space). A vector space is a set V paired with a scalar field F and equipped with two operations

+ : V × V → V (x, y) 7→ x + y · : F × V → V (α, x) 7→ α · x (also written αx) such that (V, +) is an additive group and + and · are compatible.1 In particular, since (V, +) is an additive group it must have an additive identity which we name 0 and each element x ∈ V must have an additive inverse which we denote by −x. When referencing a vector space, we often only refer to the set V especially when there is no ambiguity about which underlying field F we are using.

Typically the field F is either R or C; for most of our discussion we consider it to be R. The prototypical example of a vector space is Rn; the set of real n- with pointwise addition and pointwise scalar multiplication. Other examples include the set of matrices under pointwise addition and scalar multiplication or the set continuous functions from a topological space into C.

Any non-trivial vector space has smaller vector spaces embedded inside of it. These will be of interest for various reasons later on so we define the concept now.

Definition 20 (Subspace). A subset of a vector space is called a subspace if it contains the zero vector and is closed under addition and scalar multiplication.

That is, a subspace is a vector space which is contained in some larger vector space. The easiest examples of subspaces are the coordinate axes in Rn. Indeed, consider the subsets of 2 x n0o 2 R given by 0 x∈ and y . These are both subspaces of R . Every vector space R y∈R has two trivial subspaces: the set {0} and the space itself. Subspaces can be combined to make other subspaces in various ways. For example, intersecting two subspaces or taking the direct sum of two subspaces will yield another subspace.

One interesting thing about Rn is that all vectors in Rn can be expressed as linear n combinations of a very small set of vectors. Indeed, if x = (x1, x2, . . . , xn) ∈ R then

x = x1e1 + x2e2 + ··· xnen

th where ek is a coordinate vector; one which has zeros everywhere except a 1 in the k entry. Moreover, this representation is unique. There are no other scalars (α1, . . . , αn) such that x = α1e1 + ··· + αnen. It is reasonable to ask if other vector spaces have a similar property: can we always find a relatively small set whose linear combinations can form any vector in

1In the sense of several axioms such as distributivity and associativity which I am much too lazy to reproduce here. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 12 the space? Indeed, we can. We build toward this.

Definition 21 (Span). Let V be a vector space and M ⊂ V . The span of M (denoted by span(M)) is the smallest subspace of V which contains M; that is, it is the intersection of all subspaces of V which contain M. Equivalently, the span of M is the set of all linear combinations of elements in M. This alternative definition is especially useful when M = {v1, . . . vn} is a finite subset where we can write

span{v1, . . . , vn} = {α1v1 + ··· + αnvn : α1, . . . , αn ∈ F}.

We say vector space V is spanned by {v1, . . . , vn} if v1, . . . , vn ∈ V and V = span{v1, . . . , vn} (alternatively, we call the vectors a spanning set for V ).

n n According to this definition, the set of vectors {e1, . . . , en} in R are a spanning set for R . One deficiency in this definition is that we could always add more vectors to a spanning set. n n Indeed, {0, v, e1, . . . , en} is also a spanning set for R (where v ∈ R is arbitrary). However, the zero vector and this arbitrary v are not needed in order to span Rn since they themselves can be written as a linear combination of the other vectors in the set; including them would ruin the uniqueness of the representation of any vector x since x = α0+0v +x1e1 +···+xnen for any α ∈ R. We would like to eliminate such redundacies.

Definition 22 (Linear Independence). A set of vectors {v1, . . . , vn} in a vector space V are said to be linearly independent if for α1, . . . , αn ∈ F,

α1v1 + ··· + αnvn = 0 =⇒ α1 = ··· = αn = 0.

Otherwise, if we can find α1, . . . , αn ∈ F not all zero such that

α1v1 + ··· + αnvn = 0 then the vectors are said to be linearly dependent.

Intuitively, linearly independent sets have no redundacies; no one vector in the set can be written as a linear combination of the others. Thus we have uniqueness of representations of elements in the span of linearly independent vectors. From this definition, it is easy to see n that the coordinate vectors in R , {e1, . . . , en} form a linearly independent set which also spans Rn.

Definition 23 (Basis). A basis for a vector space is a linearly independent spanning set. Equivalently, {vi}i∈I is a basis for a vector space V if every vector v ∈ V can be represented uniquely as a finite linear combination of vectors in {vi}i∈I :

v = αi1 vi1 + ··· + αin vin , for some n ∈ N, i1, . . . , in ∈ I, αi1 , . . . , αin ∈ F. Bases are very important because they allow us write all vectors in a space in terms of a relatively small set of vectors. It isn’t clear a priori that all vector spaces have a basis. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 13

However this is indeed a theorem.

Theorem 24. Every vector space has a basis.

This theorem is deceptive in that it seems very strong and useful but in practice it is essentially useless. The theorem is actually equivalent to the axiom of choice and so the proof is entirely non-constructive. So while this theorem ensures that every vector space has a basis, it gives no advice on how to find such a basis and indeed, attempting to identify a basis could be a fruitless endeavor. In one special case it is very easy to find a basis; when the basis consists of a finite amount of vectors. Indeed in this case, we may use a basis to somehow define the size of the space. First, it should be noted that bases are far from 1 0 1 −1 2 unique. For example, both the sets 0 , 1 and 1 , 1 are bases for R . This could potentially be troublesome; if we want to use bases to determine the size of a space, we should be certain that all bases are actually the same size.

Theorem 25 (Invariance of Size of a Basis). If two sets each form a basis for the same vector space, then the sets have the same cardinality.

Definition 26 (Dimension). The dimension of a vector space is defined to be the of elements in any basis for the space. If this is finite, we call the space finite dimensional; otherwise the space is infinite dimensional. For a vector space V , this is written as dim V .

n Thus for example R is n-dimensional since the basis {e1, . . . , en} has n elements. For the remainder of these notes, we focus on finite dimensional vector spaces (as is common in undergraduate linear algebra); for a survey of results in infinite dimensional vector spaces, one could study linear functional analysis.

In finite dimensional vector spaces, it is very easy to find a basis due to the next theorem.

Theorem 27. In any n-dimensional vector space, any set of n linearly independent vectors forms a basis and any spanning set consisting of n vectors forms a basis.

This is a great result because it shows that we only need to check one property in the definition of a basis. The following theorem has a similar flavor.

Theorem 28 (Size of Spanning/Linearly Independent Sets). In an n-dimensional vector space, any linearly independent set can have at most n elements and any spanning set has at least n elements. In particular, any spanning set is larger than any linearly inde- pendent set.

This theorem makes a lot of sense intuitively: if there are only n-dimensions and you have more than n vectors, then one of them must be a combination of the others. Similarly, if there are n-dimensions then less than n vectors cannot span the space since some dimension will be missing. This theorem tells us that a basis can be viewed as a maximal linearly independent set or a minimal spanning set. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 14

With all these definitions, we move on to what is the largest topic in linear algebra: maps between vector spaces. For various reasons, linear transformations between vector spaces are of special interest.

Definition 29 (Linear Transformation). Suppose that V,W are vector spaces over the same field F. A linear transformation T : V → W is a map such that

T (αx + y) = αT (x) + T (y), for all x, y ∈ V, α ∈ F. We will focus mainly on the case that V = W so the map is a linear transformation which takes a vector space to itself though for several results this is not necessary. When V = W , the most obvious example of a linear transformation is the identity transformation I : V → V for which I(v) = v for all v ∈ V . We typcally denote this transformation by I, sometimes with an indication of the underlying space; e.g., IV . As another example, all matrices form a linear transformation.

Proposition 30 (Matrices as Linear Transformations). Every matrix A ∈ Rn×m m n induces a linear transformation TA : R → R by the rule

m TA(x) = Ax, x ∈ R . In view of this proposition, we typically blur the lines between a matrix and the linear transformation that it induces. For example, we often refer to matrices as linear transfor- mations with the understanding that the tranformation we are discussing is given by x 7→ Ax.

Taking a categorical view, we note that for a linear transformation T : V → W , the set of vectors {T (v): v ∈ V } ⊂ W is itself a subspace of W ; that is, T does actually map a vector space to a vector space which shows that linear transformations are the morphisms in the category of vector spaces. In fact, each linear transformation has two natural subspaces associated with it.

Definition 31 (Range & Null Space). Suppose V,W are vector spaces and T : V → W is a linear map. We define the null space of T by

N(T ) = {v ∈ V : T (v) = 0}

and the range (or ) of T by

R(T ) = {T (v): v ∈ V }.

Thus the null space of T is a subspace of V and the range of T is a subspace of W .

Thinking more about structure which linear transformations preserve, we can use them to identify spaces which are essentially the same. We do this in the following way. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 15

Definition 32 (Inverse of a Transformaton). Suppose that V,W are vector spaces and that a linear transformation T : V → W is one-to-one and onto. Then there is a linear transformation T −1 : W → V such that

T −1(T (v)) = v and T (T −1(w)) = w, for all v ∈ V, w ∈ W.

Such T −1 is called the inverse of the transformation of T and T is called invertible Note, this is half-definition and half-proposition; we are asserting that this inverse map is actually a linear map.

Definition 33 (Isomorphic Spaces). Two vector spaces are said to be isomorphic is there is an invertible linear map between them. When vector spaces V,W are isomorphic, we write V ∼= W .

Isomorphic spaces share essentially all the same structure. For example, isomorphic spaces have the same dimension (in fact, an invertible linear transformation will map a basis of one space to a basis of the other). The next theorem justifies focusing only on Rn rather than generic vector spaces V .

Theorem 34 (Classification of Finite Dimensional Vector Spaces). Any two n- dimensional vector spaces over the same field are isomorphic. In particular, if V is an ∼ n n-dimensional vector space over R, then V = R .

This tells us that essentially any linear algebra property that we can prove for Rn will hold in any n-dimensional vector space (we can simply translate between the two by finding an isomorphism). Given this, we drop any sort of generality and focus only on Rn for the majority of the notes.

Another curious simplification from undergraduate linear algebra is that, rather than deal with general linear maps T , we only deal with matrices. This is justified by the following.

Theorem 35 (Classification of Linear Transformations). Let T : Rm → Rn be a linear transformation. Then there is a matrix A ∈ Rn×m such that

m T (x) = Ax, x ∈ R . This further blurs the lines between matrices and linear transformations by providing a converse to Proposition 29: not only does every matrix induce a linear transformation, we also have that every linear transformation is induced by a matrix. Finding the matrix which induces a given transformation is easy; the columns of the matrix are given by T (ek) where ek is a coordinate vector (in view of this, we can think of an invertible matrix as simply changing the basis for the underlying space; in a different basis, a linear transformation will have a different matrix associated with it). With this, we drop generality even further and discuss matrices for a while. Any of the above definitions for linear transforms are adaptable to matrices; e.g. we will use R(A) to mean the range of the linear transform which A induces. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 16

Given a matrix, we would like to nail down exactly how it transforms Rn by answering questions like “what does the range of A look like?” The first realization to note is that the range of A is exactly the span of its columns. Indeed, for A ∈ Rn×m, any y in the range of A has the form y = Ax = x1A1 + x2A2 + ··· + xmAm m where x is some vector in R and Ak are the columns of A. For this reason, the range of A is often called the column space of A.

Definition 36 (Rank). The rank of a matrix is defined to be the dimension of the range or the matrix. Equivalently the rank is the number of linearly independent columns in the matrix. For a matrix A, we write this as rank(A).

The rank of an n × m matrix is at most n because the matrix maps into Rn which is n-dimensional but the rank is also at most m since the matrix has m columns. A matrix is said to be full rank if the rank is as large as possible; i.e., A ∈ Rn×m is full rank if rank(A) = min{n, m}. The rank of a matrix tells you a lot about how the matrix transforms Rn. The only rank 0 matrix is zero matrix; it maps all vectors to the point {0}. A rank 1 n × m matrix maps Rm to a line in Rn. A rank 2 matrix maps Rm to a in Rn, etc. Intuitively, the rank should tell you something about the invertibility of a matrix. The transformation which maps all of Rm onto a line (where m ≥ 2) will not be invertible since many points must be mapped to the same point on the line.

Definition 37 (Nullity). The nullity of a matrix is defined to be the dimension of the null space of the matrix. Equivalently, the nullity is maximum number of elements in linearly independent set which is annihilated by the matrix. For a matrix A, we write this as null(A).

As an extension of the above discussion, if the nullity of the matrix is non-zero, then that matrix maps a non-trivial subspace onto the zero vector which will mean the transforma- tion is non-invertible. Thus there seems to be some relationship between rank and nullity; roughly, higher rank means larger range which means less vectors are mapped to zero which means lower nullity. This line of reasoning is made precise in the following theorem.

Theorem 38 (Rank-Nullity Theorem). Suppose that A ∈ Rn×m. Then rank(A) + null(A) = m. Equivalently, m dim R(A) + dim N(A) = dim R .

We now define what it means for a matrix to be invertible and relate this to the rank and nullity of a matrix.

Definition 39 (Identity Matrix & Inverse Matrices). The identity matrix of dimen- n×n sion n is the matrix I ∈ R is the matrix with entries δij = 1 if i = j and δij = 0 for i 6= j. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 17

This matrix induces the identity transformation on Rn. A matrix A ∈ Rn×n is said to be invertible (or nonsingular) if there is a matrix B ∈ Rn×n such that AB = BA = I. In this case, we call B the inverse matrix of A and we write it as A−1. If no such matrix exists, then A is called singular.

The power of the inverse of a matrix is that it gives us a concrete way to solve linear systems. Indeed, for a system of equations like those in Example 16, we can write the equations in the form Av = b for some matrix A, vector of unknowns v and a given vector b. If A is invertible and we know the inverse A−1, then we see

Av = b =⇒ A−1Av = A−1b =⇒ Iv = A−1b =⇒ v = A−1b

so we have solved the equation for v. Because of this, it is very helpful to identify exactly which matrices are invertible and which are not. For example, from the above it seems like an n × n matrix can be invertible only if it has full rank; this turns out to be true. Towards the end of this document, we will give a fairly comprehensive list of concrete conditions which can be easily checked and are equivalent to invertibility of a matrix.

The above shows that invertibility is related to solvability of the system Av = b. We saw in Examples 16,17 that not all systems are uniquely solvable and some are insolvable. The previous paragraph seems to show that Av = b is uniquely solvable when A is invertible. This is a sufficient but unnecessary condition for unique solvability. Indeed, the system may be uniquely solvable even when the matrix A is no longer n × n. We summarize this in a proposition.

Proposition 40 (Solvability of Av = b). Suppose that A ∈ Rn×m and that b ∈ Rn is given. Then exactly one of the following is true regarding the system of equations Av = b.

(i) There is a unique solution v ∈ Rm. [This will occur when b ∈ R(A) and N(A) = {0} which is always true for an invertible matrix A ∈ Rn×n.]

(ii) There are infinitely many solutions v ∈ Rm. [This will occur when b ∈ R(A) and N(A) is non-trivial because then we can find a single solution v and add to it any element of the null space of A.]

(iii) There is no solution v ∈ Rm. [This will occur when b 6∈ R(A).] It is useful to phrase the above proposition in terms of the rank and nullity of A; we leave this as an exercise to the reader. We can conclude from this that when A is invertible, Ax = 0 has only the trivial solution x = 0. Indeed, this property is actually equivalent to invertibility when A ∈ Rn×n.

In line with the above discussion, it seems like a geometric way to determine whether a matrix is invertible if to see if it maps n-dimensional subspaces to other n-dimensional subspaces. To motivate this, we could consider volume in R3. In some rough sense, a shape Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 18

which is truly 3-dimensional should have some positive volume while shapes which are 1- dimensional or 2-dimensional (embedded in R3) have zero volume. Thus to decide whether a matrix is invertible, we could check if it maps sets of positive volume to sets of positive volume. This is essentially what the determinant of the matrix measures.

Definition 41 (Determinant). Suppose that A ∈ Rn×n. The determinant of the matrix A is defined by X det A = sign(σ)a1σ(1)a2σ(2) ··· anσ(n)

σ∈Sn where Sn is the group of permutations of {1, . . . , n}. This is also sometimes written as |A| (though we will stick to the former notation: det A).

At first this formula is somewhat mysterious but we rarely use this definition to calculate a determinant. To actually calculate a determinant, we induct upwards. By the formula, we see that   a11 a12 det = a11a22 − a12a21. a21 a22 Thus we can take the determinant of a 2 × 2 matrix. The following proposition allows us to take the determinant of larger matrices.

Proposition 42 (Cofactor Expansion). Suppose that A ∈ Rn×n and fix j ∈ 1, . . . , n. Then n X i+j (ij) det(A) = (−1) aij det(A ) i=1 where A(ij) is the matrix obtained by removing the ith row and jth column from A. Likewise, for fixed i ∈ {1, . . . , n}, n X i+j (ij) det(A) = (−1) aij det(A ). j=1 These formulas relate the determinant of an n × n matrix to that of a series of (n − 1) × (n − 1) matrices.

The geometric interpretation of the determinant is crucial to seeing how this number relates to invertibility.

Proposition 43 (Geometric Interpretation of Determinant). Let [0, 1]n denote the unit cube in Rn, let A ∈ Rn×n and let A([0, 1]n) denote the image of the unit cube under the transformation given by A. Then

|det A| = Vol A([0, 1]n).

In words, the matrix A will transform the unit cube into some n-dimensional parallelepiped; the determinant of A gives the volume of that parallelepiped. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 19

By our above reasoning then, if the determinant of A is zero, then A squishes the unit cube onto some lower dimensional parallelepiped and thus A should fail to be invertible; this turns out to be true.

It is worthwhile to memorize some of the useful properties of the determinant; we list these here.

Proposition 44 (Properties of the Determinant).

(1) det(I) = 1

(2) If two rows of a matrix are interchanged, the determinant changes sign.

(3) If one row of a matrix is multiplied by a constant, the determinant is also multiplied by that constant. In particular for A ∈ Rn×n, det(αA) = αn det(A) for α ∈ R.

(4) Adding a multiple of one row to another row will not change the determinant.

(5) The determinant is multiplicative; i.e., det(AB) = det(A) det(B) when A, B ∈ Rn×n. Having defined the determinant, we are equipped to define eigenvalues and eigenvectors but there is one more important value to define regarding a matrix which will come back later.

Definition 45 (Trace). The trace of a matrix is the sum of the diagonal elements of the matrix. In symbols, for a matrix A ∈ Rn×n, the trace is defined by n X tr A = aii i=1 where aij are the elements of the matrix.

The trace doesn’t carry quite the weight that the determinant does (for example, it is unrelated to invertibility) but it is a linear functional on Rn×n (a functional on a vector space is a map which takes the space to the underlying field) and you will be expected to know about it for the math subject GRE. We state its properties.

Proposition 46 (Properties of the Trace). Let A, B ∈ Rn×n and α ∈ R. Then (a) [Linearity.] tr(αA + B) = αtrA + trB

(b) [Cyclic Property.] tr(AB) = tr(BA) and more generally the trace is invariant under cyclic permutations so e.g.

tr(A1A2A3A4) = tr(A2A3A4A1) = tr(A3A4A1A2) = tr(A4A1A2A3) Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 20

It is important to note what this last property does not imply: the trace is not invariant under any permutations. Thus, when taking the trace of a product of several matrices, one cannot swap the matrices in an arbitrary order with impunity.

With this, we move on to eigenvalues and eigenvectors of a matrix which carry geometric information about how a matrix A transforms Rn.

Definition 47 (Eigenvalues & Eigenvectors). Let A ∈ Rn×n. An eigenvalue of A is a scalar λ ∈ C, such that Av = λv for some nonzero vector v ∈ Cn. In this case, such v is called an eigenvector corresponding to the eigenvalue λ.

The eigenvectors (if they are real) show the directions in which A simply stretches or squishes vectors and the amount of stretch is given by λ; if we know these directions, we may be able to deduce how A deforms other vectors. Imaginary eigenvalues and eigenvec- tors correspond somehow to rotation rather than stretching. Practically, we see that an eigenvalue/eigenvector pair satisfies

(A − λI)v = 0

and since v 6= 0, this shows that λ is an eigenvalue of A iff A − λI is singular. Recalling that a matrix is singular iff it has zero determinant, we see that the eigenvalues of A are the roots of the equation det(λI − A) = 0.

The function pA(λ) = det(λI − A) is called the characteristic polynomial of A; it will be a monic, order n polynomial in λ. This explains why we need to consider λ ∈ C; although pA(λ) is a real polynomial, it may have complex roots. Up to algebraic multiplicity, an n × n matrix has n eigenvalues. To relate this to invertibility, consider: if zero is an eigenvalue of A, then there is a non-zero solution to Av = 0 and so A is not invertible. The following proposition relates the eigenvalues of a matrix to its determinant and trace.

n×n Proposition 48. Let A ∈ R and let λ1, λ2, . . . , λn be the n eigenvalues of A. Then

det A = λ1λ2 ··· λn and tr A = λ1 + λ2 + ··· + λn.

Making this proposition even more explicit, when A ∈ Rn×n, n ≥ 2, the characteristic polynomial of A is given by

n n−1 n pA(λ) = λ − tr(A)λ + ··· + (−1) det(A).

That is, the determinant of A is the constant term in its characteristic polynomial (possibly with a sign change) and the trace is the second coefficient (the coefficient attached to λn−1). Thus, for example, when A is 2 × 2,

2 pA(λ) = λ − tr(A)λ + det(A). Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 21

With all these definitions, we state the largest theorem of undergraduate linear algebra. This theorem gives several ways to check if a matrix is invertible or singular. We have al- ready discussed most of these but it is nice to have them listed in one location. Many of the following conditions are simply re-statements of each other but I list them for completeness. This theorem displays the many relationships between the preceding topics

Theorem 49 (Invertible Matrix Theorem). Let A ∈ Rn×n. The following are equiva- lent:

(1) A is invertible

(2) A is right-invertible (i.e., there is B ∈ Rn×n such that AB = I)

(3) A is left-invertible (i.e., there is C ∈ Rn×n such that CA = I) (4) A is full rank (i.e., rank A = n)

(5) The columns of A form a linearly independent set

(6) The columns of A span Rn (i.e. R(A) = Rn) (7) A has zero nullity (i.e., null A = 0)

(8) A has a trivial null space (i.e. N(A) = {0})

(9) A induces an injective (one-to-one) transformation

(10) A induces a surjective (onto) tranformation

(11) A induces a bijective transformation

(12) Av = 0 has only the trivial solution v = 0

(13) Av = b is uniquely solvable for each b ∈ Rn (14) det A 6= 0

(15) 0 is not an eigenvalue of A

Of course, this is not an exhaustive list. There are plenty more equivalent conditions but these give a solid overview of what it means for a matrix to be invertible.

This concludes a vast majority of the linear algebra material which is included on the math subject GRE. For completeness, we discuss a few more topics: those related to simi- larity and diagonalizability.

Definition 50 (Similarity). Two matrices A, B ∈ Rn×n are called similar if there is an invertible matrix P ∈ Rn×n such that A = PBP −1. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 22

Similarity is an equivalence relation on Rn×n. At first glance, this definition may not seem very meaningful but similar matrices share many nice properties in common.

Proposition 51. Similar matrices have the same characteristic polynomial; hence the same eigenvalues, the same determinant and the same trace.

In view of this, similar matrices can be seen as representing the same linear transforma- tion with respect to different bases for Rn. For a matrix A ∈ Rn×n, one goal may be to find a matrix which is similar to A but has a much simpler form; this would help us identify key features about A (especially regarding how it transforms Rn) without doing much heavy lifting.

n×n n Definition 52 (Diagonal Matrix). A matrix D ∈ R with entries (dij)i,j=1 is said to be diagonal if dij = 0 when i 6= j.

Diagonal matrices are incredibly simple and easy to analyze; they transform Rn by stretching the coordinate axes. Thus each standard basis vector ek is an eigenvector of a diagonal matrix and the corresponding eigenvalue is the kth diagonal element of the ma- trix. From this, we see that the determinant of a diagonal matrix is the product of the diagonal elements and a diagonal matrix is invertible iff it has no zeros on its diagonal.

Definition 53 (Diagonalizability). A matrix is called diagonalizable if it is similar to a diagonal matrix.

Since similar matrices share the same eigenvalues, if a matrix is diagonalizable, then it is similar to the diagonal matrix with its eigenvalues as the diagonal elements. This reasoning n×n seems, at first glance, to be reversible. Suppose that A ∈ R has eigenvalues λ1, . . . , λn with corresponding eigenvectors v1, . . . vn. We can form a matrix P which has columns given by v1, . . . , vn and we will see

AP = [Av1 | · · · | Avn] = [λ1v1 | · · · | λnvn] = PD

−1 where D is the diagonal matrix with λ1, . . . , λn on the diagonal. This yields A = PDP which shows A is diagonalizable. This “proof” is exactly correct if A has n linearly inde- pedent eigenvectors. Each distinct eigenvalue has at least one eigenvector and eigenvectors corresponding to different eigenvalues are linearly independent. However, if a matrix has a repeated eigenvalue, this eigenvalue may fail to have multiple corresponding eigenvectors and thus A may have less than n linearly independent eigenvectors. An example of such a matrix is 3 1 0 A = 0 3 1 0 0 4 where the eigenvalue 3 is repeated but only has one corresponding eigenvector. Avoiding this peculiarity, the above proof is valid. We restate this fact as a theorem. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 23

Theorem 54. A real n × n matrix is diagonalizable iff its eigenvectors form a basis for Rn.

In some sense, most matrices are diagonalizable (precisely: the diagonalizable matrices form a dense subset of Rn×n with respect to any norm on Rn×n). However, given a matrix, it can be difficult to decide quickly whether it is diagonalizable. For large n, finding the eigenvalues and eigenvectors of an n × n matrix is very difficult so it is useful to characterize large classes of diagonal matrices in other ways. This requires a bit more machinery.

n×m Definition 55 (Transpose). Let A ∈ R have entries aij for 1 ≤ i ≤ n, 1 ≤ j ≤ m. m×n The transpose of A is the matrix B ∈ R with entries bij = aji for 1 ≤ i ≤ m, 1 ≤ j ≤ n. This matrix is denoted At.

Thus if A is a linear map from Rm to Rn, then At is a map from Rn back to Rm. It is immediate from the definition that the transpose is additive and will respect scalar multi- plication but it is not clear how it will interact with other operations.

Proposition 56 (Properties of the Transpose). Let A, B be real matrices. Then

(a)( At)t = A,

(b)( cA)t = cAt for all C ∈ R, (c)( A + B)t = At + Bt when A, B have the same dimensions,

(d)( AB)t = BtAt when A, B have dimensions compatible with multiplication,

(e) det A = det(At) when A is square,

(f)( At)−1 = (A−1)t when A is square and invertible (in particular, At is invertible iff A is invertible).

More generally, for any finite dimensional vector spaces V,W and any linear transforma- tion T : V → W , we can define an adjoint transformation T ∗ : W ∗ → V ∗ where V ∗,W ∗ are the continuous dual spaces of V,W respectively. In the terminology of category theory, the map which takes any vector space to its dual and transposes any linear map is a contravari- ant functor on the category of vector spaces. To gain more context regarding the meaning of transposition one can read about the dual spaces; this is outside of the scope of these notes.

Definition 57 (Symmetric Matrix). A matrix A ∈ Rn×n is said to be symmetric if A = At.

Symmetric matrices (and their more general analog, self-adjoint operators) have many desirable properties; these are highlighted whne studying inner product spaces. There are two properties which are pertinent to our discussion.

Proposition 58. The eigenvalues of a real symmetric matrix are real. Christian Parkinson GRE Prep: Diff. Eq. & Lin. Alg. Notes 24

Theorem 59 (Spectral Theorem (Real Symmetric Case)). Let A ∈ Rn×n. A is symmetric if and only if there is a diagonal matrix D ∈ Rn×n and a matrix U ∈ Rn×n such that U −1 = U t and A = UDU t. Matrices satisfying U t = U −1 are called orthogonal matrices. These correspond to rota- tions and reflections of Rn (i.e. linear transformations that preserve angles between vectors and do not change the length of vectors, though they may change orientation; interestingly, such matrices form a Lie group). Thus we often say that symmetric matrices are orthogo- nally diagonalizable.

Finally, we demonstrate two applications of diagonalizability since, as of yet, these defi- nitions are almost completely unmotivated.

One large application of diagonalizability is in the theory of systems of differential equa- tions. Indeed, if y = (y1, . . . , yn) are unknown functions satisfying

y0 = Ay for some constant matrix A ∈ Rn×n and A = PDP −1 is a diagonal decomposition of A, then −1 the functions u = (u1, . . . , un) defined by u = P y satisfy

u0 = Du which is a very easy system to solve. Reversing the substitution will yield the solution y to the original equation.

Another simpler application of diagonalizability is finding large powers of matrices. Given a matrix A, it may be of use in application to find Ak for powers k. Performing matrix multiplication is tedious and can even be computationally infeasible if A is large enough. However, if A = PDP −1 is a diagonal decomposition of A, then we notice that

A2 = PDP −1PDP −1 = PD2P −1,A3 = PD2P −1PDP −1 = PD3P −1, etc.

In general, we have Ak = PDkP −1 for any integer k ≥ 0 (and any k < 0 if A is invertible). This greatly simplifies the computation since taking the kth power of a diagonal matrix is as easy as raising each entry to the kth power.

Topics which sometimes appear on the math subject GRE and which were excluded here for brevity include normed spaces and inner product spaces. The main idea is that we can add much more structure to a vector space by specifying a way to measure length and angles between vectors. For a discussion of normed spaces, inner product spaces and much more, one could look to Peter Petersen’s linear algebra book which is freely available online.