<<

CHAPTER I

LINEARITY: BASIC CONCEPTS AND EXAMPLES

In this chapter we start with the concept of general linear spaces with elements in it called vectors, for “setting up the stage”. Then we introduce “actors” called linear mappings, which act upon vectors. In the mathematical literature, “vector spaces” is synonymous to “linear spaces” and these words will be used exchangeably. Also, “linear transformations” and “linear mappings” or simply “linear maps”, are also synonymous.

1. Linear Spaces and Linear Maps §

1.1. A is an entity containing objects called vectors. A vector is usually conceived to be something which has a magnitude and direction, so that it can be drawn as an arrow:

You can add or subtract two vectors:

You can also multiply vectors by scalars:

Such things should be familiar to you. However, we should not be so narrowminded to think that only those objects repre

1 sented geometrically by arrows in a 2D or 3D space can be regarded as vectors. As long as we have a collection of objects among which two algebraic operations called and scalar multiplication can be performed, so that certain rules of such operations are obeyed, we may regard this collection as a vector space and call the objects in this col lection vectors. Our definition of vector spaces should be so general that we encounter vector spaces almost everyday and almost everywhere. Examples of vector spaces include many spaces of functions, spaces of , spaces of sequences etc. (in addition to the wellknown 3D space in which vectors are represented as arrows.) The universality and the omnipresence of vector spaces is one good reason for placing in the position of paramount importance in basic mathematics.

1.2. After this propaganda, we come to the technical side of the definition. We start with a collection V of objects called vectors, designated by block letters u, v, etc. Suppose that V is equipped with two algebraic operations:

1. Addition, allowing us to add two vectors u and v to obtain their sum u + v.

2. Scalar multiplication, allowing us to multiply a vector v by a scalar a to form av.

Then V will be legitimately called a vector space if they obey a set of “natural” axioms. The complete set of axioms will be listed in Appendix A at the end of the present chapter. There is no need to memorize them, but here we mention some to show that they are indeed very natural.

u + v = v + u (addition is commutative)

u + (v + w) = (u + v) + w (addition is associative)

u + 0 = u a(u + v) = au + av

We are a bit vague about “scalars” in the above definition. What are scalars? The obvious answer is: they are numbers. But what sort of numbers are they? If we allow scalars to be complex numbers, then we say that V is a complex vector space, or a vector space over (the complex field) C. If we restrict scalars to real numbers, then V is called a real vector space, or a vector space over (the real field) R.

More generally, scalars are taken from something called a field. If the “field of scalars” is denoted by F, we call V a vector space over F. In recent years, finite fields become an important subject due to its vast applications such as cryptography and coding theory. We only briefly describe about fields other than R and C in Appendix B at the end of the present chapter. In this course we only consider vector spaces over R or C.

2 1.3. Examples. The best way to understand the concept of vector spaces is to go through a large number of examples and work on them in the future.

Example 1.3.1. Rn, a real vector space. The vectors in Rn are ntuples of real numbers. A typical vector may be written as

v = (v1, v2, . . . , vn),

where v1, v2, . . . , vn are real numbers. Two ntuples are equal only when their correspond n ing components are equal: for u = (u1, . . . , un) and v = (v1, . . . , vn) in R , u = v only

when u1 = v1, u2 = v2, . . . , un = vn. The addition and scalar multiplication in this space are defined in the componentwise manner: for u = (u1, u2 . . . , un), v = (v1, v2, . . . , vn) in Rn and a in R (that is, a is a ),

u + v = (u1 + v1, u2 + v2, . . . , un + vn), au = (au1, au2, . . . , aun).

These two algebraic operations are quite simple and natural. Notice that, when n = 1, we have the vector space R1, which can be identified with R. This shows that R itself can be considered as a real vector space.

Example 1.3.2. Cn, a complex vector space. The space Cn consists of all ntuples of complex numbers. Everything in this space does in the same way as the space Rn of the previous example: all we have to do is replace real scalars by complex numbers. Sometimes we would like to make a statement for both spaces Rn and Cn. To avoid repetition, we use the letter F for R or C, or even a general field of scalars. We write Fn for the space of all n–tuples of scalars in F. Using the identification of F1 with F, we see that scalars can be regarded as vectors, if we wish.

Example 1.3.3. Mmn(F)(the space of m n matrices over F.) × A vector in Fn is formed by arranging n scalars in a row. A variant of Fn is to form a

vector by arranging mn scalars in a m n matrix. Denote by Mmn(F) the set of all m n × × matrices with entries in F. With the usual addition and scalar multiplication, Mmn(F) is a vector space. A vector in this space is actually a matrix. In the future, to simplify our notation, we will write Mmn for Mmn(F), if which field F we are working with is understood.

Example 1.3.4. Direct product. This example tells us how to construct a new vector space from given vector spaces. Let

V1,V2,...,Vn be vector spaces over the same field F. Consider the set V of all ntuples

3 of the form v = (v1, v2,..., vn) where v1 is a vector in V1, v2 as a vector in V2, etc. The addition and the scalar multiplication for V is defined in the same fashion as the n corresponding operations of R described in Example 1.3.1: for u = (u1, u2,..., un) and v = (v , v ,..., vn) in V , and for a F, 1 2 ∈

u + v = (u1 + v1, u2 + v2,...... , un + vn)

au = (au1, au2, ...... , aun).

This space V is a generalization of the previous example of Fn by replacing numbers in n the entries of ntuples by vectors. If we take V = V = = Vn = F, then we recover F . 1 2 The vector space V constructed in this way is called the direct product of V1,V2 ...,Vn n and the usual notation to express this is V = V V Vn, or V = Π Vk. 1 × 2 × × k= 1 Example 1.3.5. Space of functions. This example is a space of functions with a common domain, say X (some nonempty set). A f on X is just a way to assign to each point x in X a value denoted by f(x). This value could be a scalar or a vector. Since scalars can be regarded as vectors, so it is enough to consider the vector–valued case. Take any vector space V with F as its field of scalars. Denote by (X,V ) the set of all functions f from X to V : f assigns to each F point x in X to a vector denoted by f(x) in V . Given “vectors” (which are functions in this case) f and g in (X,V ), the sum f + g and the scalar multiple af of f a scalar a F are formally defined as follows

(f + g)(x) = f(x) + g(x), (af)(x) = a.f(x), x X. ∈ If we treat x as a variable symbol so that f is written as f(x), then the sum of “vectors” f(x) and g(x) in (X,V ) is simply f(x) + g(x). In the special case with V = C and F X = N = 0, 1, 2, 3,... , { } each f (N, C) represents a sequence (of complex numbers): ∈ F

f(n) n≥ (f(0), f(1), f(2), ) { } 0 ≡

Conversely, each sequence an n≥ of complex numbers determines a function f on N, { } 0 given by f(n) = an. Thus we can identify the space (N, C) with the space of sequences, F which will be denoted by . For a = an n≥ and b = bn n≥ in , we define the S { } 0 { } 0 S sum a + b and the scalar multiple λa of a by a complex number λ as follows:

a + b = (a0, a1, a2,... ) + (b0, b1, b2,... ) = (a0 + b0, a1 + b1, a2 + b2,... )

λa = λ(a0, a1, a2,... ) = (λa0, λa1, λa2,... )

4 We will need the sequence space to study difference equations and recursion relations. S Example 1.3.6. The space of all polynomials. We specialize the previous example by taking X = C, the complex plane, and V = C, considered as a complex vector space. In some sense, the space (C, C) is too big. We F should look at something smaller inside. Recall that a is a function p on R which can be written in the form

2 3 n p(x) = a + a x + a x + a x + + anx , 0 1 2 3 where a0, a1 . . . , an are certain complex numbers and n is certain positive integer. It is clear that the sum of two polynomials is also a polynomial, and scalar multiples of polynomials are polynomials. Thus, if we denote by the set of all polynomials, we can define the P sum of two polynomials and a scalar multiple of a polynomial in the same way as we do for functions to give a linear structure to . P 1.4. A smaller vector space “sitting” inside a bigger one, such as in (C, C) in P F the last example, is a very common phenomenon. To describe this in formal language we introduce the following:

Definition 1.4.1. A (nonempty) subset M of a vector space V satisfying the following condition is called a subspace of V : (S) For all u and v in M, and for all scalars a and b, au + bv are in M.

According to this definition, is a subspace of (C, C). A subspace of some vector space P F is a vector space on its own right.

Example 1.4.2. Pn, the space of polynomials of degree n. ≤ Recall that a polynomial of degree n is a function of the form

2 n p(x) = a + a x + a x + + anx (P ) 0 1 2 with an = 0. If we drop the condition an = 0 here, we cannot tell the exact degree of p(x) – all we can say is that the degree of p(x) is at most n. Denote by Pn the subset of consisting of polynomials of degrees at most n, that is, polynomials of the form given P in (P ). It is clear that condition (S) in the definition of subspace above is satisfied for

M = Pn and V = . Therefore Pn is a subspace of and hence itself is a vector space. P P

1.5. Now we consider the concept of linear mappings, or linear transformations By a mapping (or simply a map) or a transformation we usually mean a way of sending

5 objects from one space into another. A transformation T from one vector space V to another W (over the same field) is linear if the following identity holds:

T (αx + βy) = αT x + βT y (LT ) for all vectors x and y in V and all scalars α and β. Notice that the author deliberately use letters x, y instead of u, v as vectors, and α, β instead of a, b for scalars, in order to broaden the scope of our notation. Also, following the usual custom in linear algebra, we omit the round brackets in T (x) and simply write T (x) as T x.) The linearity condition (LT ) can be split into two:

T (x + y) = T x + T y (LT 1) for all x and y in V , and T (αx) = αT x (LT 2) for all vectors x in V and for all scalars α. Identity (LT 1) is a special case of (LT ) obtained by setting α = β = 1 in (LT ). It says: the transformation T preserves the operation of addition. The following figure helps to illuminate its meaning:

Identity (LT 2) is another special case of (LT ) obtained by setting β = 0. It says: T preserves the scalar multiplication. You should draw a figure of arrows similar to the above one to clarify its meaning. Condition (LT ) can be replaced by conditions (LT 1) and (LT 2) together, (although we don’t see any advantage of doing so). Indeed, we can check that (LT ) is a consequence of this pair of conditions:

T (αx + βy) = T (αx) + T (βy) (by (LT 1)) = αT x + βT y (applying (LT 2) twice.)

By mathematical induction, we can establish the following extension of (LT ):

T (α x +α x + +αnxn) = α T x +α T x + +αnT xn 1 1 2 2 1 1 2 2 6 We note that (LT ) is just the case n = 2 written in a different way.

1.6. Examples of linear transformations are so many that you can find them almost everywhere, almost any . Here we consider a few.

Example 1.6.1. Sampling functions

Let S be a linear space of functions (allowed to take complex values) defined on a set F S. Pick some points s1, s2, . . . , sn in S as “observation sites”. The observed values of a n function in S are arranged in a row as a vector in C and is denoted by T f: F

T f = (f(s1), f(s2), . . . , f(sn)).

n Then the transformation T sending f (from S) to T f (in C ) is linear. To show the F linearity of T , we need to check the identity T (af + bg) = aT f + bT g for all f, g in S F and scalars a, b. The left hand side of this identity is, according to the definition of T ,

T (af + bg) = ((af + bg)(s1), (af + bg)(s2),..., (af + bg)(sn))

= (af(s1) + bg(s1), af(s2) + bg(s2), . . . , af(sn) + bg(sn))

= a(f(s1), f(s2), . . . , f(sn)) + b(g(s1), g(s2), . . . , g(sn)) which is aT f + bT g, that is, the right hand side. This proves the linearity of T .

Example 1.6.2. T x = x a a Recall that the inner product (or the scalar product, or the dot product) of two vectors n x = (x , x , , xn) and y = (y , y , , yn) in R is given by 1 2 1 2 n x y = xkyk (= x1y1 + x2y2 + + xnyn). k= 1 n n Fix an arbitrary vector a in R . Then the transformation Ta from R to R sending a vector x in Rn to x a is linear. To prove this, we have to check T (αx + βy) = αT x + βT y. a a a This can be done as follows: n Ta(αx + βy) = (αx + βy) a = (αxk + βxk)ak k= 1 n n = α xkak + β ykak = α x a + β y a = αTax + βTay. k= 1 k= 1 Aside: As you can see, checking things like this is rather routine. The important thing to learn is: how to present it neatly and correctly.

3 3 Example 1.6.3. Let a be a fixed vector in R . Then the transformation Ca from R into R3 itself given by C (x) = x a (the of x and a) is linear. The linearity a × of Ca can be proved in the same fashion as that of the previous examples.

7 Example 1.6.4. A differential

Recall that Pn stands for the space of all polynomials of degrees at most n. For p Pn, dp ∈ let Dp = , the of p. Then D : Pn Pn is linear. Why? Well, the linearity dx → of D is manifested by the identity D(ap + bq) = a Dp + b Dq, where p, q Pn and a, b ∈ are scalars. But this identity is nothing but another way to write down the wellknown equality d d d (ap(x) + bq(x)) = a p(x) + b q(x). dx dx dx

Example 1.6.5. Translation

Same space as the previous example: Pn. Let h be a fixed real number. Consider the transformation T : Pn Pn sending a polynomial p(x) Pn to the polynomial p(x + h). → ∈ Here p(x + h) is of course obtained by replacing x in p(x) by x + h. In other words, it is p evaluated at x+ h. For instance, if p(x) is 1 + x+x3 and h = 1, the T (p) is the polynomial 1 + (x + 1) + (x + 1)3 = 1 + x + 1 + (x3 + 3x2 + 3x + 1) = 3 + 4x + 3x2 + x3. We claim that T is linear. To prove this, we have to verify T (ap + bq) = aT p + bT q. What is T (ap + bq)? It is ap + bq evaluated at x + h, namely, ap(x + h) + bq(x + h). Now T p and T q are the polynomials p(x + h) and q(x + h) respectively, so aT p + bT q is also ap(x + h) + bq(x + h). Hence T is linear.

Example 1.6.6. Shift S

This is the operator denoted by S on the space of sequences, defined by S

S(a0, a1, a2,... ) = (a1, a2, a3,... ).

If we write a = (a0, a1, a2,... ), then (Sa)n = an+ 1. In many books on difference equations, the last equality is written as San = an+ 1. Strictly speaking, this is incorrect. But we accept this and interpret it as the correct one, namely (Sa)n = an+ 1. It is not hard to check that S is indeed linear.

“Monexample” 1.6.7. Let V = , the space of all polynomials, and let T be the P transformation from V into itself, sending a polynomial to its square: T p = p2. Then T is not linear. One way to see this is by noticing T ( x) = ( x)2 = x2. If T were linear, then − − T ( x) should be T (x) = x2, instead of x2. Aside: This is by no means the only way − − − to prove that T given here is nonlinear. You may also argue that T (2 1) = T (2) = 22 = 4, which is not the same as 2T (1) = 2. Another argument: T (1 +x) = (1 + x)2 = 1 +2x+ x2, which is not the same as T (1) + T (x) = 1 + x2. There are many ways to tell that T is not linear. One is as good as others. Don’t waste your time by presenting more than one.

8 We should mention a term widely used in scientific literature: linear operator. By a linear operator, or simply an operator, on a vector space V we mean a linear transfor mation from V into V itself. Thus Examples 1.6.3, 1.6.4, 1.6.5 and 1.6.6 above are linear operators. Another term also widely used: linear functional, (or covector in and engineering literature). If V is a vector space over F, by a linear functional of V we mean a linear transformation from V into F1 F. Thus a linear functional of V is a ≡ function φ : V F such that →

φ(a1v1 + a2v2) = a1φ(v1) + a2φ(v2)

for all v , v V and a , a F. The mapping T in Example 1.6.2 is an example of a 1 2 ∈ 1 2 ∈ a linear functional of Rn.

1.7. Given vector spaces U and V over the same field, we denote by (U, V ) the L set of all linear mappings from U to V . For S, T in (U, V ) and a scalar a, we define the L sum S + T and scalar multiple aS by putting

(S + T )x = Sx + T x, (aS)x = a Sx. x U (1.7.1) ∈ We check that both S + T and aS are linear. To simplify our presentation, we let R = aS + bT and check its linearity. Take u , u U and scalars c and c . We have to show 1 2 ∈ 1 2 R(c1u1 + c2u2) = c1Ru1 + c2Ru2. Indeed

R(c1u1 + c2u2) = (aS + bT )(c1u1 + c2u2)

= a S(c1u1 + c2u2) + b T (c1u1 + c2u2) [because of (1.7.1)]

= a(c1Su1 + c2Su2) + b(c1T u1 + c2T u2) [because S, T are linear]

= c1(aSu1 + bT u1) + c2(aSu2 + bT u2) = c1Ru1 + c2Ru2.

With addition and scalar multiplication defined here, (U, V ) becomes a vector space. L According to our notation introduced above, given a vector space V , the symbol (V,V ) stands the set of all linear operators on V . This symbol looks a bit clumsy and L hence we will rewrite it simply as (V ). On the other hand, (V, F), the set of all linear L L functionals, will be denoted by V ′, (or V ∗ in some books), called the dual space of V . We summarize our notation here:

(U, V ) =the set of all linear transformation from U to V. L (V ) =the set of all linear operators on V. L V ′ =the set of all linear functionals of V = the dual space of V.

9 We have seen that (U, V ) is a vector space under some natural way to define addition L and scalar multiplication. A fortiori, (V ) and V ′ are vector spaces. Actually, (V ) is L L more than just a vector space. Besides addition and scalar multiplication, it has a third operation: composition. The composite, or the product ST , of operators S, T (V ), is ∈ L defined by putting

(ST )v = S(T v). (1.7.2)

Aside: To apply ST to a vector v, we apply T to v first to get T v, followed by applying S. Writing (1.7.2) as (ST )(v) = S(v)T (v) is a horrendous mistake.

To give quick examples: let V = , the space of all polynomials, and let D,M,T P on V be given by D(p(x)) = p′(x) d p(x), M(p(x)) = xp(x) and T (p(x)) = p(x+1). ≡ dx Then (MD)(p(x)) = M(p′(x)) = xp′(x), (DM)(p(x)) = D(xp(x)) = p(x) + xp′(x), (TM)(p(x)) = T (xp(x)) = (x + 1)p(x + 1), (MT )(p(x)) = M(p(x + 1)) = xp(x + 1), (DT )(p(x)) = D(p(x + 1)) = p′(x + 1) and (TD)(p(x)) = T (p′(x)) = p′(x + 1).

As you can see, MD is not the same as DM (also MT = TM). Thus, in general, the product ST and TS are different. In case ST and TS are the same, i.e. ST = TS, then we say T,S commute, or T commutes with S. For example, the operators D and T given above commute.

To justify our definition of the product ST for S,T (V ) given above, we must ∈ L check its linearity. Again, it is a routine matter. For v , v V and a , a F, 1 2 ∈ 1 2 ∈

(ST )(a1v1 + a2v2) = S(T (a1v1 + a2v2)) = S(a1T v1 + a2T v2)

= a1S(T v1) + a2S(T v2) = a1(ST )v1 + a2(ST )v2.

We have the following elementary properties concerning three operations (addition, scalar multiplication, composition) among operators: for R, S, T (V ) and scalar a, ∈ L (RS)T = R(ST ) R(S + T ) = RS + RT (R + S)T = RT + ST a(ST ) = (aS)T = S(aT ).

They are verified in a routine manner, e.g., to show (R + S)T = RT + RS, we compute:

((R + S)T )x = (R + S)(T x) = R(T x) + S(T x) = RT x + ST x = (RT + ST )x.

10 There are two special linear operators on V worth mention: the zero operator O and the identity operator I: O sends every vector to the zero vector and I sends every vector to itself, that is, for all v V , Ov = 0 and Iv = v. ∈ We say that a linear mapping T from a vector space V to a vector space W is invertible if there is a linear mapping S from W to V such that ST = IV and TS = IW . Here IV stands for the identity operator on V . In the future we simply write I for IV so that the identities ST = IV and TS = IW becomes ST = I and TS = I. The S in this case is uniquely determined by T and will be denoted by T −1. Thus ST = I and TS = I become T −1T = I and TT −1 = I. To give a quick example, consider the linear operator T on the space of all polynomials given by T (p(x)) = p(x + h), where h is a constant. P Then T is invertible and its inverse T −1 is given by T −1(p(x)) = p(x h). The following − fact is basic: Proposition 1.7.1. If linear operators S and T on a vector space V are invertible, then ST is also invertible with (ST )−1 = T −1S−1. To prove this, we check directly that T −1S−1 is the inverse of ST :

(ST )(T −1S−1) = STT −1S−1 = SIS−1 = SS−1 = I, (T −1S−1)(ST ) = T −1S−1ST = T −1IT = T −1T = I.

The order reversing of S and T in the identity (ST )−1 = T −1S−1 has the following analogy attributed to a famous mathematician named Herman Weyl: we put on socks first before putting on shoes; but taking them off, we remove shoes first.

11 EXERCISE SET I.1.

Review Questions: Do I grasp the basic concepts of vector spaces and linear mappings? Am I able to write down properly the formal definitions of linear mappings, linear operators and linear functionals (as a professional mathematician does, at which I would not be embarrassed if it were in print)? What are the major examples of vector spaces and linear maps described in this section? Do I recognize the following symbols and understand perfectly well what they stand for?

3 2 5 R , C , F , M , , P , f + g, 2f 3g. 4 5 3 − (V,W ), S + T , ST , ST TS, (V ), V ′ L − L Why do we need the abstract conceptual framework of vector spaces and linear mappings? What would we miss if we restrict ourselves to the standard spaces Rn and Cn instead of working with this abstract notion?

Drills

1. Write down the general form of a vector in each of the following vector spaces: (a) F4 (F = R or C), (b) P , (c) P , (d) R3 P . 2 4 × 1 2. Find u + ( 2)v, where u, v V , in each of the following cases: − ∈ (a) V = C2, u = (2 + 3i, 3 2i) and v = (1 + i, 1 i). − − (b) V = P ; u and v are polynomials 1 2x + x2 and 1 + x x2 respectively. 2 − − (c) V = (R, C); u and v are functions cos x + i sin x and cos x i sin x respectively. F − 3. True or false: (a) A set 0 consisting of a single 0 with addition and scalar multiplication { } defined in the following way is a vector space: 0 + 0 = 0, a.0 = 0. (b) All polynomials of the form a+(1+a)x+bx2 form a vector space under the usual addition and scalar multiplication.

(c) P2 is a subspace of P3. (d) If U is a subspace of V and if V is a subspace of U, then U = V . (e) If U is a subspace of V and if W is a subspace of U, then W is a subspace of V . 4. Let p(x) = x2 x + 2 and q(x) = 2x2 + 1. Find: p(x + 1), p(1), p(q(x)), xp(x), p(x)2, − p(x2), (p + q)(x), q(x + p(x)). 5. In each of the following cases, is the given transformation from one vector space into another linear? Why? (Questions like this should be answered in the following way:

12 if your answer is “Yes”, then you have to prove it. If your anwer is “No”, you should disprove it by pointing out one instance where the linearity fails; (only one is needed — don’t waste your time to find or write down another.) (a) T : given by T (p(x)) = p(x2). P → P (b) T : given by T (p(x)) = p(x)2. P → P (c) T : R given by T (p(x)) = p(1); (here, of course, p(1) stands for the value of P → p at 1.) (d) T : R given by T (p(x)) = p(1) + 1. P → (e) The map T : M , M , (Reminder: Mm,n stands for the vector space of all 2 2 → 3 3 m n matrices) given by T (X) = AXB, where A and B are fixed matrices of × sizes 3 2 and 2 3 respectively. × × (f) The map T : M , M , given by T (X) = X + A, where A is a fixed 2 2 2 2 → 2 2 × matrix. ( Caution: Be careful about the way you put down your answer.) ♠ 2 (g) T : M , M , given by T (X) = X . 2 2 → 2 2 (h) V is a vector space and v is a fixed vector in V ; T : (V ) V is given by L → T (X) = X(v); (here, of course X (V ), i.e. X is an operator on V , and X(v) ∈ L is the vector in V obtained by applying X to v.)

Exercises

1. Let R, S, and T be linear operators on a vector space V . Simplify the following expressions: (a) R(S + T ) S(T + R) (R S)T . − − − (b) S(S−1T + TS−1) + (S−1T + TS−1)S + S−1(ST TS) (ST TS)S−1. (S is − − − assumed to be invertible.) (c) [[R, S],T ] + [[S, T ],R] + [[T,R],S]; (for linear operators P , Q on V , their commu- tator [P,Q] is defined to be the linear operator PQ QP .) − 2. On the linear space of all polynomials, define the operators M, D and U by putting P M(p(x)) = xp(x), D(p(x)) = p′(x) (the derivative of p(x)), U(p(x)) = p(x + 1). Compute (a) [M,D] (= MD DM). − (b) UMU −1 M; (notice that U −1(p(x)) = p(x 1)). − − 3. Let S and T be two linear operators satisfying the relation STS = S. (In this situation

we may call T a generalized inverse of S.) Let P = ST , Q = TS and T0 = QT P.

13 2 2 Verify that (a) P = P, (b) Q = Q, (c) PS = S, (d) SQ = S, (e) ST0S = S, (f) T0 = TST and (g) T0ST0 = T0. 4. In each of the following cases, find S2, T 2 ST and TS for S,T (V ). ∈ L 2 1 (a) V = R ; S((x1, x2)) = (x2, x1), T ((x1, x2)) = 2 (x1 + x2, x1 + x2). ′ (b) V = P1; S(p(x)) = p (x), T (p(x)) = p(0).

(c) V = M , (the space of all 2 2 matrices); S(X) = AX, T (X) = XB, where 2 2 × 0 1 0 0 A = ,B = . 0 0 0 1

Problems

1*. Prove that, for linear operators S, T on a vector space V , if S, T and S + T are invertible, then S−1 + T −1 is also invertible. 2*. (This is a very hard problem) Let S and T be linear operators on a vector space V . Prove that I ST is invertible if and only if I TS is invertible. − −

14 2. Linear Transformations and Matrices §

2.1. The main goal of the present section is to study the relation between linear mappings and matrices. We show that every m n A matrix naturally gives rise to a n m × linear mapping MA : F F and vice versa. Then we show that, given coordinate → systems in vector spaces V and W, every linear mapping T : V W can be represented by → a matrix. The device set up here tells us that matrices can be regarded as linear mappings and problems about linear mappings can often be reduced to problems in matrices.

To understand linear transformations better, let us look at the simplest situation in which both the domain and the range are the onedimensional space F1 F; here F is ≡ either R or C. Let T : F F be linear. Then, for every x F, T x = T (x 1) = xT 1. → ∈ Certainly T 1 F, i.e. T 1 is just a scalar. Why don’t we give it a name—let’s call it ∈ a. Thus T x = ax. On the other hand, it is easy to check that a transformation T from F to F of the form T x = ax is linear, where a is a fixed scalar. We have shown that linear transformations from F to F are exactly those of the form T (x) = ax. This gives a thorough description of such transformations. Nothing more we can say!

Once we succeed in tackling a special case, we should move to a more interesting general situation in which the domain of a given linear transformation is Fn and its range is in Fm. The problem now is to describe all such transformations. As you will see, the answer turns out to be similar to the one for our special case: the real number a is replaced by an m n matrix, say A, and x is replaced by an n 1 column vector x and × × the “outcome” T x is the m 1 column vector Ax. Thus linear transformations from Fn to × Fm are exactly those T which can be put in the form T x = Ax, where A is a fixed m n × matrix.

2.2. But the above answer seems to have a serious conflict with the previous conven n tion: a vector in F is used to be a row, i.e. something like x = (x1, x2, . . . , xn) instead of a column: x1 x2 x =  .  . (2.2, 1). .    xn    The rationale of our previous convention is clear: the column in (2.2.1) is awkward to write. Worse: its outstanding look draws unwarranted attention! This prompts us to adopt the following rule: Something in a row surrounded by the round brackets “(” and “)” is the same thing as the transpose of this row (which becomes a column) surrounded by the square

15 brackets “[” and “]”, such as

cow pig (cow, pig, dog, cat) = .  dog   cat    In order to work under this rule, we have to be very careful about the brackets. This rule works well, but is too “brutal”. We try to avoid applying it directly as much as possible. Certainly we may regard the column (2.2.1) as the transpose of a row and put ⊤ ⊤ x = [x x xn] or x = [x , x , . . . , xn] with commas for clarity, and sometimes 1 2 1 2 we shall do so in the future. But this still looks rather clumsy.

2.3. We proclaim:

Every matrix associates with a God-given linear transformation.

In details, given a m n matrix A with entries in the field F, we define a linear transfor n × m mation MA from F to F simply by putting

MAx = Ax (M)

⊤ n for all x = [x , x , . . . , xn] F . The transformation MA defined in this way may be 1 2 ∈ called the multiplication by A. (That is why we choose the symbol MA to denote it.) The linearity of MA follows immediately from some wellknown properties of matrix algebra: for all u and v in Fn and a, b F, we have ∈

MA(au + bv) = A(au + bv) = aAu + bAv = aMAu + bMAv.

2 3 To give a quick example, suppose A = . Then 4 5 x 2 3 x 2x + 3x M x = M 1 = 1 = 1 2 . A A x 4 5 x 4x + 5x 2 2 1 2 2 Hence MA is a linear operator on R sending (x1, x2) to (2x1 + 3x2, 4x1 + 5x2).

The converse of the above proclamation is also true:

n m Theorem 2.3.1. Every linear transformation T from F to F is of the form MA for some m n matrix A, i.e. there exists an m n matrix A such that T x = Ax for all × × x Fn. Furthermore, the matrix A here is uniquely determined by T . ∈ 16 This theorem tells us that there is a onetoone correspondence between the set n m n m (F , F ) of all linear transformations from F to F and the set Mm,n(F) of all m n L × matrices with entries in F:

n m A Mm,n(F) T = MA (F , F ). ∈ ←→ ∈ L

Before we embark on the proof of this theorem, let us take a look at the linear trans formations T : Rn R and C : R3 R3 of Examples 1.6.2 and 1.6.3 in the previous a → a → section, defined by T (x) = a x and C (x) = a x respectively. According to the a a × above theorem, there are matrices A and B of sizes 1 n and 3 3 respectively such that × × Tax = Ax and Cax = Bx. What are A and B? Well, let us write

x1 x2 Tax = a1x1 + a2x2 + anxn = [a1 a2 an]  .  . .    xn  a x a x 0 a a x  2 3 − 3 2 − 3 2 1 Cax = a3x1 a1x3 = a3 0 a1 x2 .  a x − a x   a a −0   x  1 2 − 2 1 − 2 1 3      

Therefore A = [a a an], while B is the skewsymmetric matrix given as 1 2 0 a a − 3 2 B = a3 0 a1 .  a a −0  − 2 1   Notice that B⊤ = B; (here B⊤ is the transpose of B). −

2.4. Now we turn to the proof of Theorem 2.3.1 above. There are two parts in the conclusion of the theorem: first, there is a matrix A with the property that T x = Ax, and second, such a matrix is completely determined by T . The first part concerns the existence of A and the second part its uniqueness. To prove the existence part, we have to find A (which seems to be hiding somewhere.) To prove the uniqueness part, it is enough to find out in what way does T determines A. Normally we prove the existence part first, because normally we think: if it didn’t exist, what would be the point of proving (or even talking about) its uniqueness? However, logically they are independent entities and it doesn’t matter which comes first. Here we prove the uniqueness part first—this strategy of proof is at odds with our normal thinking. There are two reasons for taking this strategy. First, the uniqueness part is easier. Second, the proof of uniqueness part actually points out a way to find A. It helps us to figure out how to prove the existence part! (You may not be

17 used to this unusual way of thinking. But people usually become smart when they begin to think in an unusual way.)

Proof of Theorem 2.3.1. Suppose that T x = Ax and we have to prove that A is

uniquely determined by T . Let A1,A2,...,An be columns of A so that

A = [A A An]. 1 2 ⊤ n It is easy to check that, for x = (x , x , , xn) [x x xn] R , T (x) is given by 1 2 ≡ 1 2 ∈ x1 x2 Ax = [A1 A2 An]  .  = A1x1 + A2x2 + + Anxn. (2.4.1) .    xn    Let us find the result of T acting on the standard basis vectors:

e1 = (1, 0,..., 0), e2 = (0, 1,..., 0),..., en = (0, 0,..., 1).

Putting x = e1 = (1, 0, 0,..., 0) in (2.4.1), i.e. x1 = 1, x2 = 0, x3 = 0 etc., we obtain T e1 = A1. In the same way we can obtain T e2 = A2 etc. Now T completely determines

T e1 = A1,T e2 = A2,...,T en = An which in turn determines the matrix A = [A A An]. 1 2 Next we prove the “existence part”. Let T : Fm Fn be a linear transformation. → We have to find a matrix A such that T x = Ax for all x Fn. Let A be the m n matrix ∈ × with T ek as its kth column (1 k n). In other words, A = [A A An], where ≤ ≤ 1 2 Ak = T ek for 1 k n. We have to check T x = Ax in order to show that A obtained in ≤ ≤ this way will do the job. Let us “recycle” the computation given in (2.4.1) and write

Ax = A x + A x + + Anxn := x A + x A + + xnAn, 1 1 2 2 1 1 2 2

(x1A1 is the usual way for a scalar multiple of a column vector, and A1x1 is the correct way x when x is regarded as a 1 1 matrix) from which it follows that 1 ×

Ax = x T e + x T e + + xnT en 1 1 2 2 = T (x e + x e + + xnen) = T x. 1 1 2 2 Here we have used the linearity of T and the following elementary manipulation

x e + x e + + xnen = x (1, 0,..., 0) + x (0, 1,..., 0) + + xn(0, 0,..., 1) 1 1 2 2 1 2 = (x1, x2, . . . , xn) = x.

18 Hence T = MA. The proof of Theorem A is complete. From the proof here, we see that

Fact. The columns of an m n matrix A are MAe ,MAe ,...,MAen, where × 1 2 e1, e2,..., en are standard basis vectors.

2.5. Now we use the fact stated at the end of the last subsection to decide the matrix of some linear operators on vector spaces of the form Fn.

Example 2.5.1. Permutation matrices

Let π be a permutation of the set [n] 1, 2, . . . , n ; in other words, π is a bijection of n n ≡ { } [n]. Consider the map Tπ : F F given by →

Tπ(x1, x2, . . . , xn) = (xπ(1), xπ(2), . . . , xπ(n)).

Then it is routine to check that Tπ is linear. Hence, by Theorem A, there is a unique n n × matrix P such that Tπx = P x, called the permutation matrix associated with π. Now we want to give a specific description of P . According to the fact stated above, the first column of P is given by Tπe1. For x = e1, we have x1 = 1, x2 = 0, x3 = 0 etc. Its image is Tπx = (xπ(1), xπ(2), . . . , xπ(n)), where the kth entry xπ(k) is 1 precisely when π(k) = 1, −1 −1 or k = π (1). Thus, except the π (1) entry, which is 1, all of the other entries of Tπe1

e e 1 e e 1 e e 1 is 0. Hence Tπ 1 = π− (1). In the same way we have Tπ 2 = π− (2), Tπ 3 = π− (3) etc. So we have

P = [e 1 e 1 e 1 ]. π− (1) π− (2) π− (n) To give a specific example, let n = 3 and let π be given by π(1) = 2, π(2) = 3 and π(3) = 1. −1 −1 −1 Then π (1) = 3, π (2) = 1, and π (3) = 2. Thus T (x1, x2, x3) = (x2, x3, x1) and the permutation matrix for Tπ is

0 1 0 e 1 e 1 e 1 e e e P = [ π− (1) π− (2) π− (3)] = [ 3 1 2] = 0 0 1 .  1 0 0    Notice that each row and each column has exactly one entry equal to 1 and the rest entries are zeros. (Remark: Permutation matrices play an important role in doubly stochastic matrices, which are useful for establishing some matrix inequalities. It turns out that all doubly stochastic matrices form a convex set and the permutation matrices turn out to be exactly the so called “extreme points” of this convex set.)

Example 2.5.2. Rotations

2 For a fixed real number θ, let T Tθ be the operator on R sending an arbitrary vector ≡ 19 v to Tθv is obtained by turning v through the angle θ in the anticlockwise direction:

2 It is not too hard to see that Tθ is a linear operator on R . The question is: what is the 2 2 matrix inducing this linear operator? Let × a b A = . c d

2 ⊤ be the matrix inducing T Tθ: T v = Av for all v R . Letting e = (1, 0) (= [1, 0] ) ≡ ∈ 1 and e2 = (0, 1), we have T e1 = (a, c) and T e2 = (b, d). On the other hand, from the following figure, we see that T e = (cos θ, sin θ),T e = ( sin θ, cos θ): 1 2 −

We conclude that the matrix which induces Tθ is

cos θ sin θ Aθ − . (2.5.1) ≡ sin θ cos θ It is a priori clear that the result of rotating a vector by an angle α, followed by a rotation through an angle β, is the same as a single rotation with angle α + β. Putting this in mathematical symbols, we have TαTβ = Tα+ β. Those matrices inducing operators in this identity have the same relation: Aα+ β = AαAβ. From this relation and (2.5.1) above, it follows immediately that

cos(α + β) = cos α cos β sin α sin β, − sin(α + β) = cos α sin β + sin α cos β, which are wellknown (but by no means obvious) identities in trigonometry.

20 2.6. In this section we discuss a bookkeeping device to label vectors by columns of numbers and to represent linear transformations by matrices, relative to some coordinate system, or more precisely, a basis. By means of such a device, a problem about vectors or linear transformations is converted to the corresponding problem about columns and matrices, which is in general easier to manipulate. Bases in linear algebra serve the same purpose as frames of reference in physics.

Definition 2.6.1. An ordered set of vectors b1, b2,..., bn in a vector space V is called a basis if each vector v in V can be written in a unique way as

v = v1b1 + v2b2 + ... + vnbn (2.6.1)

for some scalars v1, v2, . . . , vn. (Remark: Notice that there are two ingredients in this definition: the possibility to write (2.6.1) and the uniqueness of such an expression.)

With a fixed basis = b , b ,..., bn in a vector space, each vector v in V determines (in B { 1 2 } a unique manner) the scalars v1, v2, . . . , vn, via (2.6.1) above. These scalars are called the coordinates of v relative to the basis . We will arrange them into a matrix with a single B column, denoted by [v]B, and call it the column representation or the coordinate vector of v relative to : B ⊤ [v]B = (v , v , . . . , vn) [v v . . . vn] . 1 2 ≡ 1 2 The coordinates (or the column representation) of a vector depend on our choice of basis at the outset. The subscript in [v]B emphasizes this dependence. For convenience B sometimes we drop this subscript and write [v] if our choice of basis is understood.

Example 2.6.2. Standard examples of bases:

1. The standard basis for Fn. The vectors

e1 = (1, 0, 0, 0,..., 0), e2 = (0, 1, 0, 0,..., 0),..., en = (0, 0, 0, 0,..., 1)

form a basis of Fn, called the standard basis (or the natural basis, or the usual basis) n of F . The kth vector ek in this basis, has 1 in the kth entry and 0 elsewhere. With n respect to this basis, the column representation of a vector v = (v1, v2, . . . , vn) in F is ⊤ [v] = [v1, v2, . . . , vn] ; (convince yourself this is the case.)

2. The standard basis for Pn. The monomials

1, x, x2, . . . , xn

21 form a basis of Pn. With respect to this basis, the column representing a polynomial 2 n ⊤ p(x) = a + a x + a x + + anx in Pn is given by [ p ] = [a , a , a , . . . , an] . 0 1 2 0 1 2 Example 2.6.3. (a) Find the column representation of (1, 3) in R2 relative to the basis = (1, 1), (1, 1) . (b) Find the column representation of x3 relative to the basis B { − } = 1, x 1, (x 1)2, (x 1)3 in P . E { − − − } 3 ⊤ Solution. (a) Suppose [(1, 3)]B = [a, b] . Then we have (1, 3) = a(1, 1) + b(1, 1), − ⊤ which gives 1 = a+b, 3 = a b. Thus a = 2, b = 1. So the answer is [(1, 3)]B = [2, 1] . 3 − − − (b) Let [x ]E = (a0, a1, a2, a3). Then

x3 = a + a (x 1) + a (x 1)2 + a (x 1)3. 0 1 − 2 − 3 −

We have to find a0 to a3. There are several ways to do so. Here is one. Let y = x 1. 3 2 3 3 2 − 3 Then x = y + 1 and (y + 1) = a0 + a1y + a2y + a3y . Now (y + 1) = 1 + 3y + 3y + y . 2 3 2 3 Thus 1 + 3y + 3y + y = a0 + a1y + a2y + a3y . So, by comparing coefficients of powers 3 of y, we have a0 = 1, a1 = 3, a2 = 3, a3 = 1. Thus [x ]E = (1, 3, 3, 1), which is our answer.

Once we have a basis = (b , b ,..., bn) in a vector space V over the field F, we B 1 2 can define a linear mapping T from V to Fn (or simply write T : V Fn) by putting → T v = [v]B. The linearity of T means that the identity

[au + bv] = a[u] + b[v] (2.6.2) holds for all vectors u, v in V and all scalars a, b in F; (for simplicity, we write [v] for [v]B). To see this, write [u] = (u1, u2, . . . , un) and [v] = (v1, v2, . . . , vn). Then u = u b + u b + + unbn and v = v b + v b + + vnbn. Hence 1 1 2 2 1 1 2 2

au + bv = a(u b + u b + + unbn) + b(v b + v b + + vnbn) 1 1 2 2 1 1 2 2 = (au + bv )b + (au + bv )b + + (aun + bvn)bn. 1 1 1 2 2 2 and hence

[au + bv] = (au1 + bv1, au2 + bv2, . . . , aun + bvn)

= a(u1, u2, . . . un) + b(v1, v2, . . . vn) = a[u] + b[v].

−1 n Notice that T is invertible. Its inverse T simply sends any (u1, u2, . . . , un) in F to u = u b + u b + + unbn in V . 1 1 2 2

2.7. We have seen that, by introducing a basis to a vector space, we can “label” a vector in this space by a bunch of numbers arranged in a column, giving us the column

22 representation of this vector. Next we explain how to “label” a linear mapping by a bunch of numbers arranged into a rectangular array — of course here I mean a matrix. Let us start with a linear transformation T from a vector space V to a vector space W . Suppose that = v , v ,..., vn is a basis of V and = w , w ,..., wm is a basis of W . We V { 1 2 } W { 1 2 } can use a matrix to represent T . This matrix is called the representing matrix of T , or the matrix representing T , or just the matrix of T , relative to the bases and , V W and is denoted by V [T ]W , or simply [T ].

(The subscript and the superscript in [T ]V emphasize the dependence of this matrix W V W on these two bases and, when their presence is understood, we simply write [T ] for this matrix) Matrix [T ] is constructed column by column in the following way. To find its first column, we apply T to the first basis vector v in to get T v . As T v is a vector in W , 1 V 1 1 we can express it in a unique way as a of basis vectors in , say W

T v = t w + t w + + tm wm. 1 11 1 21 2 1

The coefficients of this linear combination, namely, t11, t21 etc. will fill up the first column

of [T ]. To find the second column of [T ], we apply T to v2 to get T v2 in W and express it as a linear combination of vectors in , say W

T v = t w + t w + + tm wm. 2 12 1 22 2 2 Then fill up the second column by coefficients of this linear combination. The other columns of [T ] are obtained in the similar fashion. Thus, we come up with

t t t n 11 12 1 t21 t22 t2n [T ] =  . . .  , . . .    tm1 tm2 tmn    where the jth column of [T] (consisting of t1j, t2j, . . . , tmj) comes from

T vj = t jw + t jw + + tmjwm. 1 1 2 2

In other words, the jth column is just [T vj]W , the column representing T vj relative to the basis in W. In case V = W and = , T is a linear operator on V and the matrix VW V W V [T ] [T ] is a square matrix. In this case we will write [T ]V instead of [T ] . ≡ V V

Example 2.7.1. (a) Let T be the operator on V = P2 sending p(x) to p(x + 1). Find the matrix [T ] of this operator relative to the standard basis = 1, x, x2 . (b) Find the B { } 23 ′ matrix of differential operator D defined on P2 by D(p) = p (the derivative of p), relative to the standard basis . (c) Consider the linear map M from P to P defined by the B 1 2 recipe M(p(x)) = xp(x). In P we take the standard basis = 1, x but in P we take 1 B { } 2 the basis = 1, 1 + x, (1 + x)2 . Find the matrix representation relative to these bases. C { } Solution: (a) and (b) Since T (1) = 1, T (x) = x + 1 = 1 + x, T (x2) = (x + 1)2 = 1 + 2x + x2 and D(1) = 0, D(x) = 1, D(x2) = 2x, we have

1 1 1 0 1 0 [T ]B = 0 1 2 , [D]B = 0 0 2 .  0 0 1   0 0 0      2 B (c) It is easy to get M(1) = x, M(x) = x . But, in order to get the matrix [M]C , we have to write M(1) = x and M(x) = x2 as linear combinations of 1, 1 + x, (1 + x)2,

in other words, we have to find out a0, a1, a2 and b0, b1, b2 in the following identities: x = a +a (1+x)+a (1+x)2, x2 = b +b (1+x)+b (1+x)2. Let s = 1+x so that x = s 1. 0 1 2 0 1 2 − Then x = 1 + s = 1 + (1 + x) and x2 = (s 1)2 = 1 2s + s2 = 1 2(1 + x) + (1 + x)2. − − − − − The matrix [M] [M]B can be read off from these identities: ≡ C 1 1 [M] = −1 2 .  0− 1   

Relative to a basis of V , for S, T (V ), a, b F and v V , we have ∈ L ∈ ∈ [aS + bT ] = a[S] + b[T ], [ST ] = [S][T ], [Sv] = [S][v]. (2.7.1)

This tells us that the representation by matrices captures the essence of operators. But we defer the proof of this fact to Chapter 3 when the symbol will be systematically used.

24 EXERCISE SET I.2.

Review Questions: What is the meaning of “the linear transformation induced by a matrix”? What is the basic fact concerning such induced transformations? How to find the matrix which induces a given linear transformation (in principle)? Do I understand each of the following concepts?

basis, column representation of vectors, matrix representation of operators

Can I describe the procedure of finding column representation of vectors and matrix rep resentation of linear transformations to any first year student?

Drills

1. Write down the matrices which induce the following operators on F3: (a) T , sending (x , x , x ) to (x + x x , x x , x + x ). 1 2 3 1 2 − 3 1 − 2 2 3 (b) D, sending (x , x , x ) to ( x , 2x , x ). 1 2 3 − 1 2 3 (c) R, sending (x , x , x ) to (x cos θ + x sin θ, x , x sin θ + x cos θ). 1 2 3 1 3 2 − 1 3 (d) (with F = R) Q, sending (x , x , x ) to (1, 2, 3) (x , x , x ). 1 2 3 × 1 2 3 2. Find the 2 2 real matrix A such that its induced operator MA projects each vector × v in R2 orthogonally to the passing through the origin and (1, 1), as indicated in the left figure below:

o 3 3. Find the 3 3 matrix A such that MA is the 120 rotation in R about the axis × through the origin and the point (1, 1, 1), indicated in the right figure above.

4. In each of the following cases, write down the matrix A of the operator T MA on ≡ R2 satisfying the given conditions: 2 (a) T e1 = (1, 1) and T = O. 2 (b) T e1 = (2, 2) and T = T . 2 (c) T e1 = (cos θ, sin θ) and T = I, where the given angle θ satisfies 0 < θ < π/2.

25 6. Give a basis for each of the following vector spaces: 2 3 (a) P , (b) R , (c) C , (d) P P , (e) M , . 3 1 × 2 2 2 7. In each of the following cases, write down a basis for the kernel ker(T ) and the range T (V ) of the given linear transformation T : (a) T : P P , T p = p′, the derivative of p. 2 → 2 (b) T : P P , T (p(x)) = xp(x). 2 → 3 2 2 1 1 (c) T : F F , T = MA, where A = . → 1 1 (d) T : R2 P , T ([a, b]⊤) = a + bx2. → 2 1 0 (e) T : M , M , , T (X) = BX, where B = . 2 2 → 2 2 1 0 (f) T : M , M , , T (X) = XB, where B is the same as the one in (e) above. 2 2 → 2 2 (g) T : M , M , , T (X) = BX XB, where B is the same as the one in (e). 2 2 → 2 2 − 8. In each of the following cases, find the column representation of the vector v relative to the given basis in V . (The justification of to be a basis is not required.) B B (a) V = R2, v = (2, 3), = (1, 0), (1, 1) . B { } (b) V = C2, v = (2 + 2i, 0), = b , b with b = (1 + i, 1 i), b = (1 i, 1 + i). B { 1 2} 1 − 2 − (c) V = P , v is the polynomial 2+3x and = 1 + x, 1 x . 1 B { − } (d) V = P , v is 1 + x + x2, = 1 + x, 1 + x2, x + x2 . 2 B { } 5 2 1 0 1 0 0 1 0 0 (e) V = M , , v is and = , , , . 2 2 0 1 B 0 1 0 1 0 0 1 0 − (d) V , W , and are the same as those in (c), T (p(x)) = p(x2). B C 1 0 (e) V = W = M , , T (X) = AX XA with A = , and are the standard 2 2 − 1 0 B C basis of M2,2:

1 0 0 1 0 0 0 0 = = , , , . B C 0 0 0 0 1 0 0 1

Exercises

1. Use Theorem 2.3.1 to describe (a) the correspondence between the vector spaces n n (F , F) and M ,n, and (b) the correspondence between the vector spaces (F, F ) L 1 L and Mn,1.

26 2*. Show that a 2 2 matrix A induces a rotation of R2 if and only if it is of the form × a b A = − with det(A) a2 + b2 = 1. b a ≡ Check that A−1 = A⊤. 3*. According to special relativity, a boost is a linear operator on R2 induced by a matrix of the form a b a b with a2 b2 = 1 and a > 0. b a b a ≡ − Show that the product of two boosts is a boost. 4*. Let v = (a, b) be a fixed nonzero vector in R2. Find the matrix A which induces the projection P onto the onedimensional subspace spanned by v, as indicated in the left figure below:

2 5*. Let Lα be the line through the origin in the plane R , so that the angle between Lα and the horizontal axis is α. (a) Find the 2 2 matrix which induces the mirror × reflection Tα about Lα indicated in the right figure above. (b) Show that the product TαTβ of two such reflections is a rotation Rθ. Find the angle θ of rotation in terms of α and β.

3 2 2 2 2 6*. Let n = (n1, n2, n3) be a unit vector in R ( n n1 + n2 + n3 = 1) and let H be 3 | | ≡ the plane in R through the origin and perpendicular to n. Let Tn be the mirror reflection with respect to H, indicated in the following figure:

(a) Show that T v = v 2(v n)n for all v R3. (b) Write down the matrix which n − ∈ induces this reflection. (c) Suppose that m = (m1, m2, m3) is another unit vector and

27 T is the reflection defined in the similar fashion. Show that the vector w n m m ≡ × is invariant for Tm Tn, that is Tm Tnw = w. 7*. Find the 3 3 matrix A which induces the 60o rotation T in R3 about the axis through × the origin and the point (1, 1, 1). Compute A2. Explain why your answer for A2 is the expected one.

28 3. Linear Equations §

3.1. Let V and W be vector spaces over the field F of scalars and let T : V W → be a linear mapping. Let b be a vector in W . Consider the equation

T x = b (3.1.1)

We say that a vector v in V is a solution to this equation if T v = b.

Example 3.1.1. System of linear equations Let A be an m n matrix over F and let b be a vector in Fm, say × a11 a12 a1n b1 a21 a22 a2n b2  . . .   .  A = . . . , b = . .              am1 am2 amn   bm   n  m   Let T = MA, that is, the linear map T : F F given by T x = Ax. Then the → equation (3.1.1), that is, T x = b, is the following system of linear equations

a x + a x + + a nxn = b 11 1 12 2 1 1 a x + a x + + a nxn = b 21 1 22 2 2 2 (3.1.2)

am x + am x + + amnxn = bm 1 1 2 2 The reader is assumed to be familiar with some general method to solve this system. When the spaces V and W are finite dimensional, using coordinate systems, it is possible to convert a general equation (3.1.1) in this form. Thus, (3.1.1) in principle can be solved.

Example 3.1.2. Problem

Let us briefly recall Example 1.6.1. Let S be a linear space of functions (allowed to take F complex values) defined on a set S and let s1, s2, . . . , sn be some selected points in S. n Consider the linear transformation T : S C defined by F →

T f = (f(s1), f(s2), . . . , f(sn)).

Given b = (b , b , . . . , bn), a solution to the equation T f = b is a function f in S 1 2 F satisfying f(s1) = b1, f(s2) = b2, . . . , f(sn) = bn. Finding such a solution is called an interpolation problem.

29 Example 3.1.3. Linear ordinary differential equations

Consider the space V = (R, C) of differentiable functions and the operator T on V F given by T y = dy ay, where a is a fixed complex number and t is the variable of the dt − “unknown” function y. For any f in V , the equation T y = f (which has the same form as (3.1.1)) is a linear ODE (ordinary differential equation):

dy ay = f(t). (3.1.3) dt − To solve this, we take two steps. First, consider the corresponding homogeneous equation

dy ay = 0. (3.1.4) dt − We can solve this equation by the method called separation of variables as follows. Multiply (3.1.4) by dt to rewrite it as dy aydt = 0, or dy = aydt. Next, divide both sides by − y to get dy/y = adt. Now, put in integral sign on each side to arrive at dy/y = adt. Thus we have ln y = at + c, where c is a constant, or y = eat+ c = Ceat, where C = ec is also a constant. The expression y = Ceat (3.1.5)

is the general solution of the homogeneous equation (3.1.4). The second step we take is to look for a special solution of (3.1.3) by using a method called variation of constants (or variation of parameters, according to some books). Following this method, we seek for a solution of (3.1.3) of the form y = u(t)eat, which is obtained from (3.1.5) by changing the constant C to a function u(t). For y = u(t)eat, we have

dy ay = u′(t) eat + u(t)(aeat) au(t)eat = u′(t) eat. dt − −

Thus, (3.1.3) becomes u′(t) eat = f(t), or u′(t) = e−atf(t), which gives u(t) = e−atf(t) dt. t −as at t a(t−s) A particular solution of (3.1.3) is y(t) = 0 e f(s) ds e = 0 e f(s) ds. The at t a(t−s) expression y = Ce + 0 e f(s) ds is the general solution of (3.1.3), according to a general principle explained in the next subsection. Although this is a closed formula for solving (3.1.3), finding the integral on its right hand side may still cause some difficulty. In 3.5 we will describe a practical method for solving inhomogeneous equations. § Example 3.1.4. Systems of linear ordinary differential equations We may extend equation (3.1.3) by changing a into a matrix A and y into a vector–valued

function y. Let A = [aij] be an n n matrix over C. Consider the vector space × V = (R, Cn) of (smooth) functions defined on the real line taking values in Cn. Let F 30 f V be given. For convenience, we use letter t as the real variable. Thus, a “vector” in ∈ V can be written as y(t) = (y1(t), y2(t), . . . , yn(t)). For each j, the j–component of y(t) ′ is yj(t), which is a function of real variable t. Write yj(t) for the derivative of yj(t) and let ′ ′ ′ ′ ′ y (t) = (y1(t), y2(t), . . . , yn(t)). Define a linear operator T on V by putting T y = y Ay. ′ − Given f(t) = (f (t), f (t), . . . , fn(t)), the equation T y = f becomes y Ay = f, or 1 2 − y′(t) = Ay(t) + f(t). (3.1.6)

This is the “vector form” of the general system of linear ordinary differential equations with constant coefficients. If we spell out all components of this equation, we have

′ y = a y + a y + + a nyn + f (t) 1 11 1 12 2 1 1 ′ y = a y + a y + + a nyn + f (t) 2 21 1 22 2 2 2 . (3.1.7) . ′ y = an y + an y + + annyn + fn(t) n 1 1 2 2

We may consider the more general case, where the matrix entries aij (also called coef ficients) are nonconstant, that is, they are allowed to be functions of t: aij = aij(t).

In this case A = [aij] is a matrixvalued function of t and, to emphasize this, we write ′ A = A(t) = [aij(t)]. Equation (3.1.3) becomes y (t) = A(t)y(t) + f(t). When the coeffi

cients aij are nonconstant, the basic theory of linear ordinary differential equations makes no big difference, but the solutions usually no longer can be expressed explicitly in terms of elementary functions.

Example 3.1.5. Higher order linear ordinary differential equations In practice we often deal with higher order equations of the form

(n) (n−1) (2) ′ y + an− y + + a y + a y + a y = f(t) (3.1.8) 1 2 1 0 where y(k) stands for kth order derivative of y. We can rewrite (3.1.8) in the form T y = f as follows. Let D be the operator of taking derivative; (we can write D = d/dt if we wish). n n−2 Introduce the polynomial p(λ) = λ + an− λ + + a λ + a , where the coefficients 1 1 0 ak come from (3.1.8). Let

n n−1 T = p(D) D + an− D + + a D + a I (3.1.9) ≡ 1 1 0 Then (3.1.8) can be rewritten as T y = f. Equation (3.1.8) can be rendered to a system of first order equations. Thus in theory equations of the form (3.1.8) are covered by systems of equations of the form (3.1.7). This can be seen by introducing the new functions

31 ′ (n−1) y1 = y, y2 = y , . . . , yn = y . We give an example to see how this can be done. The general case will be given in Appendix C. Consider the equation

D3y 2D2y + 3Dy 5y = (t 2)et. (3.1.10) − − − Introduce new functions y = y, y = Dy, y = D2y. Then Dy = D3y = 5y 3Dy + 1 2 3 3 − 2D2y + (t 2)et. Hence − Dy = y , Dy = y , and Dy = 5y 3y + 2y + (t 2)et 1 2 2 3 3 1 − 2 3 − We can rewrite (3.1.10) as y′ = Ay + f, where

y1 0 1 0 0 y = y2 ,A = 0 0 1 , f = 0  y   5 3 2   (t 2)et  3 − −       Example 3.1.6. Linear difference equations A general linear difference equation of order N can be written as

a yk + a yk + a yk + + aN yk N = bk (3.1.11) 0 1 + 1 2 + 2 +

Recall from Example 1.6.6 that on the space of all sequences y = (y0, y1, y2,...), the

shift operator S is defined by (Sa)k = ak+ 1. Notice that (Iy)k = yk,(Sy)k = yk+ 1, 2 3 2 (S y)k = (S(Sy))k = (Sy)k+ 1 = yk+ 2,(S y)k = (S (Sy))k = (Sy)k+ 2 = yk+ 3, etc. In general, m (S y)k = yk+ m. (3.1.12)

2 N Let p(λ) = a + a λ + a λ + + aN λ (here we use λ as the variable) and 0 1 2 2 N T = p(S) = a I + a S + a S + + aN S . (3.1.13) 0 1 2 Then, in view of (3.1.12), we have

2 N (T y)k = a (Iy)k + a (Sy)k + a (S y)k + + aN (S y)k 0 1 2 = a yk + a yk + a yk + + aN yk N 0 1 + 1 2 + 2 + which is the left hand side of (3.1.11), Thus we have (T y)k = bk for all k. Therefore

T y = b, with y = (y0, y1,...) and b = (b0, b1,...). We have shown that (3.1.11) can be written as T y = b, where T is given in (3.1.13).

We mention that there are many other important types of linear equations which can be put in the form of (3.1.1), not given here, such as linear partial differential equations.

32 3.2. Now we begin with some elementary theory of the equation

T x = b. (3.2.1)

Here T is a linear mapping from V to W . If we replace the right hand side b by the zero vector 0, we get the so–called the corresponding homogeneous equation. All solutions to T x = 0 form a vector space, called the kernel of T and is denoted by ker T . Indeed, if u, v are in ker T , then T u = 0 and T v = 0, and hence

T (au + bv) = aT u + T v = 0

(for any scalars a, b), which entails au + bv ker T . In mathematical symbols, we write ∈ ker T = x V : T x = 0 . { ∈ } Now suppose that v V is a solution to (3.2.1), that is, T v = b. We claim: the 0 ∈ 0 solution set of (3.2.1) is v + ker T v + x : x ker T ; in other words, the general 0 ≡ { 0 ∈ } solution to (3.2.1) is the sum of the particular solution v0 and the general solution to the homogeneous equation (3.2.2). Indeed, if x ker T , then ∈

T (v0 + x) = T v0 + T x = b + 0 = b showing that v0 + x is indeed a solution to (3.2.1). On the other hand, if v is another solution of (3.2.1), then, letting x = v v , we have v = v + x and − 0 0 T x = T (v v ) = T v T v = b b = 0, − 0 − 0 − telling us that v = v0 + x with x in ker T . This explains why sometimes (but not always) we solve (3.2.1) in two steps: step one, find the general solution to its corresponding homogeneous equation; step two, find a particular solution to (3.2.1).

Example 3.2.1. Consider the differential equation y′ y = x. A particular solution to − this equation is yp = 1 x, as we can check directly; (a method for finding such a particular − solution will be discussed in 3.5 below). The general solution to the homogeneous equation § y′ y = 0 can be found by the method of separation of variables: rewrite this equation as − dy/y = dx and integrate: dy/y = dx, which gives ln y = x + c, or y = Cex, where C = ec. From our above discussion we see that y = Cex + 1 x is the general solution to − y′ y = x. − Whether equation (3.2.1) has a solution depends on the vector b on the right hand side. We say that b is in the range of T if (3.2.1) has a solution. Thus the range of T

33 is the set of all vectors b in W for which (3.2.1) has a solution, that is, there exists v in V such that T v = b. We denote by T (V ) the range of T . Thus, in math symbols,

T (V ) = y W : there exists v V such that T v = y = T v: v V . { ∈ ∈ } { ∈ }

Notice that T (V ) is a subspace of W . To see this, take y1, y2 in T (V ). Then there

exist v1 and v2 such that T v1 = y1 and T v2 = y2. For any scalars a1, a2, we have

a1y1 + a2y2 = a1T v1 + a2T v2 = T (a1v1 + a2v2) showing that a1y1 + a2y2 is indeed in T (V ). In the rest of the present section, we discuss some methods of solving some linear equations of types mentioned in various examples in the subsection 3.1. §

3.3. Denote by the space of all polynomials and let s , s , . . . , sn be distinct P 1 2 points in the complex plane. Let T : Cn be the map given by P →

T p = (p(s1), p(s2), . . . , p(sn)).

We are asked to solve T p = b for given b = (b1, b2, . . . , bn). The corresponding homogeneous equation is T p = 0, or (p(s1), p(s2), . . . , p(sn)) = 0. Thus, p is a solution to T p =0 if and only if s1, s2, . . . , sn are roots of p, in other words,

Q(x) (x s )(x s ) (x sn) ≡ − 1 − 2 − is a factor of p(x). Thus the general solution to T p = 0 is p(x) = Q(x)f(x), where f(x) is any polynomial. Denote by Qk(x) the polynomial obtained from Q(x) by deleting

the factor x sk; in other words, Qk(x) is that polynomial such that the identity −

Q(x) = (x sk)Qk(x) −

holds. Notice that Qk(sj) = 0 for all j = k and Qk(sk) = 0. Let Lk(x) = Qk(x)/Qk(sk). Then we have Lk(sj) = 0 for j = k and Lk(sk) = 1. (Using the Kronecker delta δjk, n we can write Lk(sj) = δjk.) Recall the standard basis of C :

e1 = (1, 0, 0,..., 0, 0), e2 = (0, 1, 0,..., 0, 0),... en = (0, 0, 0,..., 0, 1).

We have proved that TLk = ek for all k. Thus,

n n n b = bkek = bkTLk = T bkLk k= 1 k= 1 k= 1 34 n showing that the Lagrange polynomial L(x) = k= 1bkLk(x) is a special solution to the interpolation problem T p = b. The general solution to this problem can be written as p(x) = L(x) + Q(x)f(x), where f(x) is an arbitrary polynomial and L(x),Q(x) are polynomials as given above.

Example 3.3.1. Solve the following Lagrange interpolation problem: find a polynomial p of degree 2 such that p(1) = 3, p(2) = 6, p(3) = 13.

Solution. Using the above notation, we have T p = (p(1), p(2), p(3)) and b = (3, 6, 13). We are asked to solve T p = b. Now Q(x) = (x 1)(x 2)(x 3), Q (x) = − − − 1 (x 2)(x 3), Q (x) = (x 1)(x 3) and Q (x) = (x 1)(x 2), with Q (1) = 2, − − 2 − − 3 − − 1 Q (2) = 1 and Q (3) = 2. So L (x) = 1 (x 2)(x 3), L (x) = (x 1)(x 3) and 2 − 3 1 2 − − 2 − − − L (x) = 1 (x 1)(x 2). The Lagrangian polynomial is 3 2 − − 3 13 L(x) = (x 2)(x 3) 6(x 1)(x 3) + (x 1)(x 2) = 2x2 3x + 4. 2 − − − − − 2 − − −

So the general solution of our problem is 2x2 3x + 4 + (x 1)(x 2)(x 3)f(x), where − − − − f(x) is any polynomial.

3.4. A common situation occurs in a homogeneous equation T x = 0 is that T = p(S), where p is a polynomial and S is an operator considerably simpler than T . Example 3.1.4 with the operator in (3.1.6) and Example 3.1.5 with the operator in (3.1.10) are typical in such situation. Let λ1, λ2, . . . , λr be all roots of the polynomial p with multiplicities m1, m2, . . . , mr respectively. Then we have

m1 m2 mr p(x) = a(x λ ) (x λ ) (x λr) . − 1 − 2 −

m m m and hence T = p(S) = a(S λ I) 1 (S λ I) 2 (S λrI) r . We claim that solving − 1 − 2 − p(S)x = 0 (3.4.1) can be reduced to solving each of

m1 m2 mr (S λ I) x = 0, (S λ I) x = 0,..., (S λrI) x = 0 (3.4.2) − 1 − 2 − More precisely, we have

Proposition 3.4.1. With the above notation, the general solution to p(S)x = 0 can be written as x = x1 + x2 + + xr, where xk (1 k r) is the general solution to the m ≤ ≤ kth equation in (3.4.2), namely (S λkI) k x = 0. − 35 Before proving the above proposition, we give some examples to illustrate this.

Example 3.4.2. Find the general solution to differential equation y′′ y′ 2y = 0. − − Solution. Rewrite the equation as p(D)y = 0, where p(x) = x2 x 2 = (x 2)(x+1) − − − and D is the differentiation operator given by Dy = y′. Reduce p(D)y = 0 to two equations (D 2I)y = 0 and (D + I)y = 0, or y′ 2y = 0 and y′ + y = 0, which have − − general solutions y = Ce2t and y = Ce−t respectively; (here we use t for the variable of the function y). Hence the general solution to y′′ y′ 2y = 0 is y = C e2t + C e−t. − − 1 2

Example 3.4.3. Find the general solution to differential equation y′′ + y = 0.

Solution. Rewrite the equation as p(D)y = 0, where p(x) = x2 + 1 = (x i)(x + i) − and D is the differentiation operator given by Dy = y′. Reduce p(D)y = 0 to two equations (D iI)y = 0 and (D + iI)y = 0, which have general solutions y = Ceit and −it − ′′ it −it y = Ce respectively. The general solution to y + y = 0 is y = C1e + C2e . Using Euler’s formula eit = cos t + i sin t and e−it = cos t i sin t, we can put it in another form: − y = A cos t + B sin t.

Example 3.4.4. Solve the difference equation yn yn 2yn = 0. + 2 − + 1 − Solution. Rewrite the equation as p(S)y = 0, where p(x) = x2 x 2 = (x 2)(x+1) − − − and S is the shift operator given by Syn = yn+ 1. Reduce p(S)y = 0 to two equations

(S 2I)y = 0 and (S + I)y = 0, or yn+ 1 2yn = 0 and yn+ 1 + yn = 0, which have − n n− general solutions yn = a2 and y = a( 1) respectively. Hence the general solution to n − n yn yn 2yn = 0 is y = a2 + b( 1) . + 2 − + 1 − −

m Now we return to the proof of Proposition 3.4.1. Since (S λkI) k is a factor of m − T = p(S), solutions to (S λkI) k x = 0 are also solutions to p(S)x = 0. Hence expressions − of the form x = x + x + + xr described in the proposition are solutions to T x = 0. 1 2 Next we prove that all solutions to T x = 0 can be put in the form x = x + x + + xr 1 2 as described in the proposition (this is the hard part). Let

m1 m2 mr p (x) = (x λ ) , p (x) = (x λ ) (x λr) . 1 − 1 2 − 2 −

Then p(x) and p2(x) are coprime (because they don’t have any root in common) and hence there are polynomials q1(x) and q2(x) such that p1(x)q1(x)+p2(x)q2(x) = 1. Suppose that

v is a solution to T x = 0. Let v1 = p2(S)q2(S)v and v2 = p1(S)q1(S)v. Then

v = Iv = (p1(S)q1(S) + p2(S)q2(S))v = v2 + v1 = v1 + v2

36 with p1(S)v1 = p1(S)p2(S)q2(S)v = p(S)q2(S)v = q2(S)p(S)v = q2(S)T v = 0 and similarly we have p2(S)v2 = 0. Thus v = v1 + v2, where v1 is a solution to p1(S)x = 0 and v2 is a solution to p2(S)x = 0. Now the proof can be completed by induction on r.

3.5. For a linear operator T on a linear space V , sometimes we can find a particular solution of T x = b by taking a judicial choice of a subspace M such that b belongs to M, and M is invariant under T in the sense that T (M) M, that is, for all v in M, T v is ⊆ also in M. This method will be clear by going through some examples. (Our choice of M in these examples will be illuminated in 1.3 of the next chapter.) §

Example 3.5.1. Find the indefinite integral ex sin x dx by using functions of the form aex sin x + bex cos x. Solution. This integral is a solution to the equation Du = ex sin x, where D = d/dx. Our choice of subspace M consists of functions of the form u = aex sin x + bex cos x. Now

Du = D(aex sin x + bex cos x) = a(ex sin x + ex cos x) + b(ex cos x ex sin x) − = (a b)ex sin x + (a + b)ex cos x. − Thus, if u is indeed a solution to Du = ex sin x, then we set a b = 1 and a + b = 0, which − gives a = 1/2 and b = 1/2. So u = 1 ex sin x 1 ex cos x is a solution to Du = ex sin x − 2 − 2 and ex ex sin x dx = (sin x ex cos x) + C. 2 − This is originally a question in calculus, usually answered by integration by parts. As you see, it is easier to answer this by using linear algebra.

Example 3.5.2. Find a particular solution to y′′ + y′ + y = x2 by using the subspace 2 P2, which consists of functions of the form a + bx + cx .

2 Solution. Here our choice of the subspace is P2. Suppose that y = a + bx + cx is a particular solution. Then

x2 = y′′ + y′ + y = 2c + (b + 2cx) + (a + bx + cx2) = (2c + b + a) + (2c + b)x + cx2 which gives 2c + a + b = 0, 2c + b = 0 and c = 1. So c = 1, b = 1/2 and a = 3/2. We − − conclude that y = x2 1 x 3 is a particular solution. − 2 − 2 37 The method described here is not limited to finding a particular solution to a nonhomo geneous equation, as shown in the following example.

Example 3.5.3. Find the general solution to y′′ 2y′ + y = x2. − Solution. Use the method described in the last example, we can find a particular 2 solution to this equation yp = x + 4x + 6. Next we solve the homogeneous equation y′′ 2y′ + y = 0, which can be rewritten as (D2 2D + I)y = 0, or (D I)2y = 0. − − − Let z = (D I)y. Then we have (D I)z = 0 with z = cex as its general solution. − − Next we write (D I)y = z as y′ y = cex. Choose the subspace M of functions − − of the form axex + bex. If y = axex + bex is a solution to y′ y = cex, then, from − y′ y = (aex + axex + bex) (axex + bex) = ax we get aex = cex and hence a = c. Thus − − y = axex + bex (b arbitrary) is a solution to y′ y = aex and hence is also a solution to − y′′ 2y′ + y = 0. Thus the general solution to the original equation y′′ 2y′ + y = x2 is − − y = axex + bex + x2 + 4x + 6.

Once we know in which subspace we should look for a solution, a complicated expres sion for the operator T in the equation T x = b does not intimidate us, as shown in the following example.

Example 3.5.4. Let T be the operator on the space of functions given by T f(x) = xf ′(x) + f(x + 1) + f(0)x2. Find a (special) solution to T f(x) = 1 2x + 5x2. − Solution. Since the right hand side is in P2 and since, as we can check, P2 is an invariant subspace of T , it is natural to look for a solution in P2. So we set f(x) = a + bx + cx2. Then

T f(x) = x(b + 2cx) + a + b(x + 1) + c(x + 1)2 + ax2 = (a + b + c) + (2b + 2c)x + (3c + a)x2.

In order to satisfy the equation T f(x) = 1 2x + 5x2, we set a + b + c = 1, 2b + 2c = 2 − − and 3c + a = 5, which gives a = 2, b = 2 and c = 1. Hence f(x) = 2 2x + x2 is a − − solution to the equation T f(x) = 1 2x + 5x2. − In the next chapter we provide some clue to “right guess” for solving T x = b under certain circumstances by the dimensional consideration. In chapter 3 we describe a general method of solving system of equations called diagonalization.

38 EXERCISE SET I.3.

Review Questions: What are linear equations and their solutions? What are the major examples of linear equations? What is the relation between solutions of T x = b and solu tions of its homogeneous equation T x = 0? How do we use factorization of a polynomial p(x) to solve a higher order linear differential equation p(D)y = 0 or difference equation p(S)y = 0?

Drills

1. Convert each of the following differential equations into a system of first order equa tions of the form y′ = Ay + f in vector notation.

(a) y′′ + 2y′ 3y = 1 + t. − (b) y′′ + y = sin t. (c) y(3) 2y(2) + 3y′ y = 0. − − (d) y′′ + ty′ + (cos t)y = sin t.

2. Convert each of the following difference equations into a system of first order equations

of the form yn+ 1 = Ayn + f in vector notation.

(a) yn + 2yn 3yn = 1 + t. + 2 + 1 − (b) yn+ 2 + yn = sin t.

(c) yn 2yn + 3yn yn = 1. + 3 − + 2 + 1 − −n 2 (c) yn 2nyn + 3e yn = n . + 2 − + 1 3. Solve each of the following Lagrange interpolation problems

(a) p P , p( 1) = 3, p(2) = 9. ∈ 1 − (b) p P , p( 1) = 2 i, p(i) = 4 i. ∈ 1 − − − (c) p P , p(0) = 1, p(1) = 3, p(2) = 7. ∈ 2 (d) p P , p( 1) = 4, p(0) = 1, p(1) = 0, p(2) = 5. ∈ 3 − − − 4. Solve each of the following homogeneous linear differential equations.

(a) y′′ 2y′ 3y = 0. − − (b) y′′ 2y′ + 2y = 0. − (c) y′′ 4y′ + 4y = 0. − 39 (d) y(4) y = 0. − 5. Solve each of the following homogeneous linear difference equations.

(a) yn 2yn 3yn = 0. + 2 − + 1 − (b) yn 2yn + 2yn = 0. + 2 − + 1 (c) yn 4yn + 4yn = 0. + 2 − + 1 (d) yn yn = 0. + 4 − 6. Use the method of linear algebra as suggested in the present section to find the fol lowing indefinite integrals.

2 3 x (a) (4 + x + 5x2x )e dx. (Hint: Use the linear space of functions which can be expressed in the form aex + bxex + cx2ex + dx3ex.)

(b) (2 sin2 x cos2 x + sin x cos x)ex dx. (Hint: Use the linear space of functions − which can be expressed in the form aex sin2 x + bex cos2 x + cex sin x cos x.)

7. Use the method suggested in the present section solve each of the following linear differential equations. (a) y′′ 2y′ 3y = 8 4t 3t2. (Hint: Use the linear space of functions which can − − − − be expressed in the form a + bt + ct2.) (b) y′′ 2y′ +2y = et. (Hint: Use the linear space of functions which can be expressed − in the form aet.) (c) y′′ 4y′ + 4y = tet 4et. (Hint: Use the linear space of functions which can be − − expressed in the form aet + btet.) (d) y(4) y = cos t sin t. (Hint: Use the linear space of functions which can be − − expressed in the form a sin t + b cos t.)

8. Use the method suggested in the present section solve each of the following linear difference equations.

2 (a) yn 2yn 3yn = 2 + 4n 4n . + 2 − + 1 − − (b) yn 2yn + 2yn = n + 1. + 2 − + 1 n+ 2 (c) yn 4yn + 4yn = 2 . + 2 − + 1 (d) yn yn = 1. (Hint: Try yn = an + b.) + 4 −

40 Appendices for Chapter I

Appendix A*: Axioms for vector spaces

By a vector space V over a field F we mean a set V with an algebraic structure endowed by the following two operations

1. Addition, allowing us to add two vectors u and v to obtain their sum u + v. 2. Scalar multiplication, allowing us to multiply a vector v by a scalar a to form av. such that the following 8 axioms are satisfied:

(V1) u + v = v + u (for all u, v in V .) (V2) u + (v + w) = (u + v) + w, (for all u, v, w in V .) (V3) There is a unique object 0 called zero vector such that u + 0 = u for all u. (V4) For each u in V , there is a v in V such that u + v = 0. (We will call v the negative of u and denote it by u.) − (V5) (a + b)u = au + bu (for all u in V and for all scalars a, b.) (V6) a(u + v) = au + av (for all u, v in V and for all scalars a.) (V7) a(bu) = (ab)u (for all u in V and for all scalars a, b.) (V8) 1u = u (for all u in V .)

There is a systematic way to read these rules governing the operations (addition and scalar multiplication) of a vector space. (V1) to (V4) concern addition only. (V1) and (V2) are commutative law and associative law respectively, same as those governing addition of numbers. (V3) and (V4) concern the existence of the zero vector and the “negative” to a vector. Any algebraic system with an addition operation satisfying (V1) to (V4) is called an abelian group. (V5) and (V6) involves both operations; they resemble the distributive law. (V7) and (V8) concern scalar multiplication only. (V7) resembles the associative law and (V8) is a rule for normalizing this operation.

We can derive elementary properties of vector space operations from (V1)–(V8) such as

(1) a 0 = 0. (Here 0 is the zero vector given in (V3).) (2) 0 v = 0. (3) If av = 0, then either a = 0 or v = 0. (4) ( 1)v = v. (Here v is the negative of v described in (V4).) − − − 41 To prove (1), notice that from (V3) we have 0 + 0 = 0 (u + 0 = u holds for u = 0). Hence

a 0 = a (0 + 0) = a 0 + a 0 by (V6). Adding (a 0) to both sides and apply the associative law (V2) on the righthand − side, we obtain 0 = a 0, giving (1). The proof of (2) goes in the same manner:

0 v = (0 + 0) v = 0 v + 0 v

by (V5), and then add (0 v) to both sides. The proof of (3) is a bit more subtle. In case − a = 0, there is nothing to prove. So we may (and we do) assume that a = 0. Thus it is legitimate to consider a−1 and multiply both sides of a v = 0 by a−1. Thus

a−1(av) = a−10 = 0

in view of (1) that we have proved a minute ago. On the other hand, by (V7),

a−1(av) = (a.a−1) v = 1 v, which is v by (V8). Hence v = 0, as desired. Assertion (4) says ( 1)v is the negative of − v. This means, according to (V4), v + ( 1)v = 0. The last identity can be checked as − follows: v + ( 1)v = 1 v + ( 1)v = (1 + ( 1))v = 0 v = 0. − − − Hence (4) is valid.

We only consider natural examples of vector spaces. But we have to be aware that weird looking vector spaces do exist. Here is a example of weird spaces suggested by special relativity. Let V consist of real numbers v strictly between 1 and 1: 1 < v < 1. We use − − and to indicate the two basic operations in V defined as follows to distinguish them ⊕ ⊙ from the usual addition and multiplication of real numbers: for u, v V and a R, ∈ ∈ u + v (1 + u)a (1 u)a u v = , a u = − − . ⊕ 1 + uv ⊙ (1 + u)a + (1 u)a − For checking (V1) to (V8), you need to have a large piece of paper, write small, and a lot of patience.

Appendix B*: Fields

Besides R and C, we briefly mention other fields. First, the smallest field in R is the field of all rational numbers, denoted by Q. Recall that a rational number is a number

42 which can be written as a fraction of integers, that is, it can be expressed as m/n, where m and n are integers with n = 0. Between Q and C there are many so called algebraic number fields. A simple example of algebraic number fields is Q(√2), consisting of numbers of the form a + b√2, where a and b are rational numbers. Algebraic number theory is a big industry having many mathematicians as workers, so to speak. Beyond C there is the field of rational functions. Recall that a rational function is a function which can be expressed as p(x)/q(x), where p(x) and q(x) are polynomials. Going further are so called algebraic function fields. They can be defined by pure algebraic means via “field extensions” or by analytic means so that they appear as the field of “meromorphic functions on Riemann surfaces”.

Now we describe finite fields, which find many practical use in recent years, such as coding and cryptography. The simplest finite field is F2, which consists of two elements denoted by 0 and 1. The addition and multiplication in F2 are given by 0+0 = 0, 0+1=1+0=1,1+1=0,1 0 = 0 1 = 0 0 = 0, 1 1 = 1. More generally, × × × × for any prime number p, we can define the finite field Fp, which has p elements. Every n finite field can be written as Fq, where q = p , where p is a prime number and n is a positive integer. An interesting aspect of the theory is, Fq can be regarded as a linear

space over Fp and the basic theory of linear algebra is a good guide to the understanding the finite field Fq. Because of the multiplication operation of a finite field, each element in Fq naturally associates with a linear operator on this vector space and a large portion of operator theory in linear algebra applies.

A fancy way to do number theory is to introduce the so–called “valuation fields”, such as the field of padic numbers. Mathematically it is one of the most fascinating and challenging research area. It seems to me that, though greatly respected, practically it is completely useless. However, some experts in number theory start to speculate its link to some deepest myths of our universe.

Appendix C*: Converting higher order linear ODEs to systems of first order ODEs

In Example 3.1.4 we consider higher order equations of the form

(n) (n−1) (2) ′ y + an− y + + a y + a y + a y = f(t) (3.1.5) 1 2 1 0

′ (n−1) Introduce the new functions y1 = y, y2 = y , . . . , yn = y . Then we have

′ ′ ′ y1 = y2, y2 = y3, . . . , yn−1 = yn, ′ y = a y a y a yn. n − 0 1 − 1 2 − − (n−1) 43 We can rewrite (3.1.5) as y′ = Ay + f, where

0 1 0 0 0 0 0 1 0 0  0 0 0 1 0  A =   . (3.1.7)        0 0 0 0 1     a0 a1 a2 a3 an−1   − − − − −  with y1 y 0 y′ . y2 . y =  .  =  .  , and f =   . . 0    (n−1)     yn   y   f        (As we shall see, matrix A has many interesting properties.) In practice there is no apparent advantage of converting a higher order equation into a system of first order equations. But, in developing the theory, a system of first order equations is easier to deal with than a higher order equation.

In the same vein we can convert a higher order linear difference equation to a system of first order linear difference equations.

44