CHAPTER I
LINEARITY: BASIC CONCEPTS AND EXAMPLES
In this chapter we start with the concept of general linear spaces with elements in it called vectors, for “setting up the stage”. Then we introduce “actors” called linear mappings, which act upon vectors. In the mathematical literature, “vector spaces” is synonymous to “linear spaces” and these words will be used exchangeably. Also, “linear transformations” and “linear mappings” or simply “linear maps”, are also synonymous.
1. Linear Spaces and Linear Maps §
1.1. A vector space is an entity containing objects called vectors. A vector is usually conceived to be something which has a magnitude and direction, so that it can be drawn as an arrow:
You can add or subtract two vectors:
You can also multiply vectors by scalars:
Such things should be familiar to you. However, we should not be so narrow minded to think that only those objects repre
1 sented geometrically by arrows in a 2D or 3D space can be regarded as vectors. As long as we have a collection of objects among which two algebraic operations called addition and scalar multiplication can be performed, so that certain rules of such operations are obeyed, we may regard this collection as a vector space and call the objects in this col lection vectors. Our definition of vector spaces should be so general that we encounter vector spaces almost everyday and almost everywhere. Examples of vector spaces include many spaces of functions, spaces of polynomials, spaces of sequences etc. (in addition to the well known 3D space in which vectors are represented as arrows.) The universality and the omnipresence of vector spaces is one good reason for placing linear algebra in the position of paramount importance in basic mathematics.
1.2. After this propaganda, we come to the technical side of the definition. We start with a collection V of objects called vectors, designated by block letters u, v, etc. Suppose that V is equipped with two algebraic operations:
1. Addition, allowing us to add two vectors u and v to obtain their sum u + v.
2. Scalar multiplication, allowing us to multiply a vector v by a scalar a to form av.
Then V will be legitimately called a vector space if they obey a set of “natural” axioms. The complete set of axioms will be listed in Appendix A at the end of the present chapter. There is no need to memorize them, but here we mention some to show that they are indeed very natural.
u + v = v + u (addition is commutative)
u + (v + w) = (u + v) + w (addition is associative)
u + 0 = u a(u + v) = au + av
We are a bit vague about “scalars” in the above definition. What are scalars? The obvious answer is: they are numbers. But what sort of numbers are they? If we allow scalars to be complex numbers, then we say that V is a complex vector space, or a vector space over (the complex field) C. If we restrict scalars to real numbers, then V is called a real vector space, or a vector space over (the real field) R.
More generally, scalars are taken from something called a field. If the “field of scalars” is denoted by F, we call V a vector space over F. In recent years, finite fields become an important subject due to its vast applications such as cryptography and coding theory. We only briefly describe about fields other than R and C in Appendix B at the end of the present chapter. In this course we only consider vector spaces over R or C.
2 1.3. Examples. The best way to understand the concept of vector spaces is to go through a large number of examples and work on them in the future.
Example 1.3.1. Rn, a real vector space. The vectors in Rn are n tuples of real numbers. A typical vector may be written as
v = (v1, v2, . . . , vn),
where v1, v2, . . . , vn are real numbers. Two n tuples are equal only when their correspond n ing components are equal: for u = (u1, . . . , un) and v = (v1, . . . , vn) in R , u = v only
when u1 = v1, u2 = v2, . . . , un = vn. The addition and scalar multiplication in this space are defined in the componentwise manner: for u = (u1, u2 . . . , un), v = (v1, v2, . . . , vn) in Rn and a in R (that is, a is a real number),
u + v = (u1 + v1, u2 + v2, . . . , un + vn), au = (au1, au2, . . . , aun).
These two algebraic operations are quite simple and natural. Notice that, when n = 1, we have the vector space R1, which can be identified with R. This shows that R itself can be considered as a real vector space.
Example 1.3.2. Cn, a complex vector space. The space Cn consists of all n tuples of complex numbers. Everything in this space does in the same way as the space Rn of the previous example: all we have to do is replace real scalars by complex numbers. Sometimes we would like to make a statement for both spaces Rn and Cn. To avoid repetition, we use the letter F for R or C, or even a general field of scalars. We write Fn for the space of all n–tuples of scalars in F. Using the identification of F1 with F, we see that scalars can be regarded as vectors, if we wish.
Example 1.3.3. Mmn(F)(the space of m n matrices over F.) × A vector in Fn is formed by arranging n scalars in a row. A variant of Fn is to form a
vector by arranging mn scalars in a m n matrix. Denote by Mmn(F) the set of all m n × × matrices with entries in F. With the usual addition and scalar multiplication, Mmn(F) is a vector space. A vector in this space is actually a matrix. In the future, to simplify our notation, we will write Mmn for Mmn(F), if which field F we are working with is understood.
Example 1.3.4. Direct product. This example tells us how to construct a new vector space from given vector spaces. Let
V1,V2,...,Vn be vector spaces over the same field F. Consider the set V of all n tuples
3 of the form v = (v1, v2,..., vn) where v1 is a vector in V1, v2 as a vector in V2, etc. The addition and the scalar multiplication for V is defined in the same fashion as the n corresponding operations of R described in Example 1.3.1: for u = (u1, u2,..., un) and v = (v , v ,..., vn) in V , and for a F, 1 2 ∈
u + v = (u1 + v1, u2 + v2,...... , un + vn)
au = (au1, au2, ...... , aun).
This space V is a generalization of the previous example of Fn by replacing numbers in n the entries of n tuples by vectors. If we take V = V = = Vn = F, then we recover F . 1 2 The vector space V constructed in this way is called the direct product of V1,V2 ...,Vn n and the usual notation to express this is V = V V Vn, or V = Π Vk. 1 × 2 × × k= 1 Example 1.3.5. Space of functions. This example is a space of functions with a common domain, say X (some nonempty set). A function f on X is just a way to assign to each point x in X a value denoted by f(x). This value could be a scalar or a vector. Since scalars can be regarded as vectors, so it is enough to consider the vector–valued case. Take any vector space V with F as its field of scalars. Denote by (X,V ) the set of all functions f from X to V : f assigns to each F point x in X to a vector denoted by f(x) in V . Given “vectors” (which are functions in this case) f and g in (X,V ), the sum f + g and the scalar multiple af of f a scalar a F are formally defined as follows
(f + g)(x) = f(x) + g(x), (af)(x) = a.f(x), x X. ∈ If we treat x as a variable symbol so that f is written as f(x), then the sum of “vectors” f(x) and g(x) in (X,V ) is simply f(x) + g(x). In the special case with V = C and F X = N = 0, 1, 2, 3,... , { } each f (N, C) represents a sequence (of complex numbers): ∈ F
f(n) n≥ (f(0), f(1), f(2), ) { } 0 ≡
Conversely, each sequence an n≥ of complex numbers determines a function f on N, { } 0 given by f(n) = an. Thus we can identify the space (N, C) with the space of sequences, F which will be denoted by . For a = an n≥ and b = bn n≥ in , we define the S { } 0 { } 0 S sum a + b and the scalar multiple λa of a by a complex number λ as follows:
a + b = (a0, a1, a2,... ) + (b0, b1, b2,... ) = (a0 + b0, a1 + b1, a2 + b2,... )
λa = λ(a0, a1, a2,... ) = (λa0, λa1, λa2,... )
4 We will need the sequence space to study difference equations and recursion relations. S Example 1.3.6. The space of all polynomials. We specialize the previous example by taking X = C, the complex plane, and V = C, considered as a complex vector space. In some sense, the space (C, C) is too big. We F should look at something smaller inside. Recall that a polynomial is a function p on R which can be written in the form
2 3 n p(x) = a + a x + a x + a x + + anx , 0 1 2 3 where a0, a1 . . . , an are certain complex numbers and n is certain positive integer. It is clear that the sum of two polynomials is also a polynomial, and scalar multiples of polynomials are polynomials. Thus, if we denote by the set of all polynomials, we can define the P sum of two polynomials and a scalar multiple of a polynomial in the same way as we do for functions to give a linear structure to . P 1.4. A smaller vector space “sitting” inside a bigger one, such as in (C, C) in P F the last example, is a very common phenomenon. To describe this in formal language we introduce the following:
Definition 1.4.1. A (nonempty) subset M of a vector space V satisfying the following condition is called a subspace of V : (S) For all u and v in M, and for all scalars a and b, au + bv are in M.
According to this definition, is a subspace of (C, C). A subspace of some vector space P F is a vector space on its own right.
Example 1.4.2. Pn, the space of polynomials of degree n. ≤ Recall that a polynomial of degree n is a function of the form
2 n p(x) = a + a x + a x + + anx (P ) 0 1 2 with an = 0. If we drop the condition an = 0 here, we cannot tell the exact degree of p(x) – all we can say is that the degree of p(x) is at most n. Denote by Pn the subset of consisting of polynomials of degrees at most n, that is, polynomials of the form given P in (P ). It is clear that condition (S) in the definition of subspace above is satisfied for
M = Pn and V = . Therefore Pn is a subspace of and hence itself is a vector space. P P
1.5. Now we consider the concept of linear mappings, or linear transformations By a mapping (or simply a map) or a transformation we usually mean a way of sending
5 objects from one space into another. A transformation T from one vector space V to another W (over the same field) is linear if the following identity holds:
T (αx + βy) = αT x + βT y (LT ) for all vectors x and y in V and all scalars α and β. Notice that the author deliberately use letters x, y instead of u, v as vectors, and α, β instead of a, b for scalars, in order to broaden the scope of our notation. Also, following the usual custom in linear algebra, we omit the round brackets in T (x) and simply write T (x) as T x.) The linearity condition (LT ) can be split into two:
T (x + y) = T x + T y (LT 1) for all x and y in V , and T (αx) = αT x (LT 2) for all vectors x in V and for all scalars α. Identity (LT 1) is a special case of (LT ) obtained by setting α = β = 1 in (LT ). It says: the transformation T preserves the operation of addition. The following figure helps to illuminate its meaning:
Identity (LT 2) is another special case of (LT ) obtained by setting β = 0. It says: T preserves the scalar multiplication. You should draw a figure of arrows similar to the above one to clarify its meaning. Condition (LT ) can be replaced by conditions (LT 1) and (LT 2) together, (although we don’t see any advantage of doing so). Indeed, we can check that (LT ) is a consequence of this pair of conditions:
T (αx + βy) = T (αx) + T (βy) (by (LT 1)) = αT x + βT y (applying (LT 2) twice.)
By mathematical induction, we can establish the following extension of (LT ):
T (α x +α x + +αnxn) = α T x +α T x + +αnT xn 1 1 2 2 1 1 2 2 6 We note that (LT ) is just the case n = 2 written in a different way.
1.6. Examples of linear transformations are so many that you can find them almost everywhere, almost any time. Here we consider a few.
Example 1.6.1. Sampling functions
Let S be a linear space of functions (allowed to take complex values) defined on a set F S. Pick some points s1, s2, . . . , sn in S as “observation sites”. The observed values of a n function in S are arranged in a row as a vector in C and is denoted by T f: F
T f = (f(s1), f(s2), . . . , f(sn)).
n Then the transformation T sending f (from S) to T f (in C ) is linear. To show the F linearity of T , we need to check the identity T (af + bg) = aT f + bT g for all f, g in S F and scalars a, b. The left hand side of this identity is, according to the definition of T ,
T (af + bg) = ((af + bg)(s1), (af + bg)(s2),..., (af + bg)(sn))
= (af(s1) + bg(s1), af(s2) + bg(s2), . . . , af(sn) + bg(sn))
= a(f(s1), f(s2), . . . , f(sn)) + b(g(s1), g(s2), . . . , g(sn)) which is aT f + bT g, that is, the right hand side. This proves the linearity of T .
Example 1.6.2. T x = x a a Recall that the inner product (or the scalar product, or the dot product) of two vectors n x = (x , x , , xn) and y = (y , y , , yn) in R is given by 1 2 1 2 n x y = xkyk (= x1y1 + x2y2 + + xnyn). k= 1 n n Fix an arbitrary vector a in R . Then the transformation Ta from R to R sending a vector x in Rn to x a is linear. To prove this, we have to check T (αx + βy) = αT x + βT y. a a a This can be done as follows: n Ta(αx + βy) = (αx + βy) a = (αxk + βxk)ak k= 1 n n = α xkak + β ykak = α x a + β y a = αTax + βTay. k= 1 k= 1 Aside: As you can see, checking things like this is rather routine. The important thing to learn is: how to present it neatly and correctly.
3 3 Example 1.6.3. Let a be a fixed vector in R . Then the transformation Ca from R into R3 itself given by C (x) = x a (the cross product of x and a) is linear. The linearity a × of Ca can be proved in the same fashion as that of the previous examples.
7 Example 1.6.4. A differential operator
Recall that Pn stands for the space of all polynomials of degrees at most n. For p Pn, dp ∈ let Dp = , the derivative of p. Then D : Pn Pn is linear. Why? Well, the linearity dx → of D is manifested by the identity D(ap + bq) = a Dp + b Dq, where p, q Pn and a, b ∈ are scalars. But this identity is nothing but another way to write down the well known equality d d d (ap(x) + bq(x)) = a p(x) + b q(x). dx dx dx
Example 1.6.5. Translation
Same space as the previous example: Pn. Let h be a fixed real number. Consider the transformation T : Pn Pn sending a polynomial p(x) Pn to the polynomial p(x + h). → ∈ Here p(x + h) is of course obtained by replacing x in p(x) by x + h. In other words, it is p evaluated at x+ h. For instance, if p(x) is 1 + x+x3 and h = 1, the T (p) is the polynomial 1 + (x + 1) + (x + 1)3 = 1 + x + 1 + (x3 + 3x2 + 3x + 1) = 3 + 4x + 3x2 + x3. We claim that T is linear. To prove this, we have to verify T (ap + bq) = aT p + bT q. What is T (ap + bq)? It is ap + bq evaluated at x + h, namely, ap(x + h) + bq(x + h). Now T p and T q are the polynomials p(x + h) and q(x + h) respectively, so aT p + bT q is also ap(x + h) + bq(x + h). Hence T is linear.
Example 1.6.6. Shift S
This is the operator denoted by S on the space of sequences, defined by S
S(a0, a1, a2,... ) = (a1, a2, a3,... ).
If we write a = (a0, a1, a2,... ), then (Sa)n = an+ 1. In many books on difference equations, the last equality is written as San = an+ 1. Strictly speaking, this is incorrect. But we accept this and interpret it as the correct one, namely (Sa)n = an+ 1. It is not hard to check that S is indeed linear.
“Monexample” 1.6.7. Let V = , the space of all polynomials, and let T be the P transformation from V into itself, sending a polynomial to its square: T p = p2. Then T is not linear. One way to see this is by noticing T ( x) = ( x)2 = x2. If T were linear, then − − T ( x) should be T (x) = x2, instead of x2. Aside: This is by no means the only way − − − to prove that T given here is nonlinear. You may also argue that T (2 1) = T (2) = 22 = 4, which is not the same as 2T (1) = 2. Another argument: T (1 +x) = (1 + x)2 = 1 +2x+ x2, which is not the same as T (1) + T (x) = 1 + x2. There are many ways to tell that T is not linear. One is as good as others. Don’t waste your time by presenting more than one.
8 We should mention a term widely used in scientific literature: linear operator. By a linear operator, or simply an operator, on a vector space V we mean a linear transfor mation from V into V itself. Thus Examples 1.6.3, 1.6.4, 1.6.5 and 1.6.6 above are linear operators. Another term also widely used: linear functional, (or covector in physics and engineering literature). If V is a vector space over F, by a linear functional of V we mean a linear transformation from V into F1 F. Thus a linear functional of V is a ≡ function φ : V F such that →
φ(a1v1 + a2v2) = a1φ(v1) + a2φ(v2)
for all v , v V and a , a F. The mapping T in Example 1.6.2 is an example of a 1 2 ∈ 1 2 ∈ a linear functional of Rn.
1.7. Given vector spaces U and V over the same field, we denote by (U, V ) the L set of all linear mappings from U to V . For S, T in (U, V ) and a scalar a, we define the L sum S + T and scalar multiple aS by putting
(S + T )x = Sx + T x, (aS)x = a Sx. x U (1.7.1) ∈ We check that both S + T and aS are linear. To simplify our presentation, we let R = aS + bT and check its linearity. Take u , u U and scalars c and c . We have to show 1 2 ∈ 1 2 R(c1u1 + c2u2) = c1Ru1 + c2Ru2. Indeed
R(c1u1 + c2u2) = (aS + bT )(c1u1 + c2u2)
= a S(c1u1 + c2u2) + b T (c1u1 + c2u2) [because of (1.7.1)]
= a(c1Su1 + c2Su2) + b(c1T u1 + c2T u2) [because S, T are linear]
= c1(aSu1 + bT u1) + c2(aSu2 + bT u2) = c1Ru1 + c2Ru2.
With addition and scalar multiplication defined here, (U, V ) becomes a vector space. L According to our notation introduced above, given a vector space V , the symbol (V,V ) stands the set of all linear operators on V . This symbol looks a bit clumsy and L hence we will rewrite it simply as (V ). On the other hand, (V, F), the set of all linear L L functionals, will be denoted by V ′, (or V ∗ in some books), called the dual space of V . We summarize our notation here:
(U, V ) =the set of all linear transformation from U to V. L (V ) =the set of all linear operators on V. L V ′ =the set of all linear functionals of V = the dual space of V.
9 We have seen that (U, V ) is a vector space under some natural way to define addition L and scalar multiplication. A fortiori, (V ) and V ′ are vector spaces. Actually, (V ) is L L more than just a vector space. Besides addition and scalar multiplication, it has a third operation: composition. The composite, or the product ST , of operators S, T (V ), is ∈ L defined by putting
(ST )v = S(T v). (1.7.2)
Aside: To apply ST to a vector v, we apply T to v first to get T v, followed by applying S. Writing (1.7.2) as (ST )(v) = S(v)T (v) is a horrendous mistake.
To give quick examples: let V = , the space of all polynomials, and let D,M,T P on V be given by D(p(x)) = p′(x) d p(x), M(p(x)) = xp(x) and T (p(x)) = p(x+1). ≡ dx Then (MD)(p(x)) = M(p′(x)) = xp′(x), (DM)(p(x)) = D(xp(x)) = p(x) + xp′(x), (TM)(p(x)) = T (xp(x)) = (x + 1)p(x + 1), (MT )(p(x)) = M(p(x + 1)) = xp(x + 1), (DT )(p(x)) = D(p(x + 1)) = p′(x + 1) and (TD)(p(x)) = T (p′(x)) = p′(x + 1).
As you can see, MD is not the same as DM (also MT = TM). Thus, in general, the product ST and TS are different. In case ST and TS are the same, i.e. ST = TS, then we say T,S commute, or T commutes with S. For example, the operators D and T given above commute.
To justify our definition of the product ST for S,T (V ) given above, we must ∈ L check its linearity. Again, it is a routine matter. For v , v V and a , a F, 1 2 ∈ 1 2 ∈
(ST )(a1v1 + a2v2) = S(T (a1v1 + a2v2)) = S(a1T v1 + a2T v2)
= a1S(T v1) + a2S(T v2) = a1(ST )v1 + a2(ST )v2.
We have the following elementary properties concerning three operations (addition, scalar multiplication, composition) among operators: for R, S, T (V ) and scalar a, ∈ L (RS)T = R(ST ) R(S + T ) = RS + RT (R + S)T = RT + ST a(ST ) = (aS)T = S(aT ).
They are verified in a routine manner, e.g., to show (R + S)T = RT + RS, we compute:
((R + S)T )x = (R + S)(T x) = R(T x) + S(T x) = RT x + ST x = (RT + ST )x.
10 There are two special linear operators on V worth mention: the zero operator O and the identity operator I: O sends every vector to the zero vector and I sends every vector to itself, that is, for all v V , Ov = 0 and Iv = v. ∈ We say that a linear mapping T from a vector space V to a vector space W is invertible if there is a linear mapping S from W to V such that ST = IV and TS = IW . Here IV stands for the identity operator on V . In the future we simply write I for IV so that the identities ST = IV and TS = IW becomes ST = I and TS = I. The linear map S in this case is uniquely determined by T and will be denoted by T −1. Thus ST = I and TS = I become T −1T = I and TT −1 = I. To give a quick example, consider the linear operator T on the space of all polynomials given by T (p(x)) = p(x + h), where h is a constant. P Then T is invertible and its inverse T −1 is given by T −1(p(x)) = p(x h). The following − fact is basic: Proposition 1.7.1. If linear operators S and T on a vector space V are invertible, then ST is also invertible with (ST )−1 = T −1S−1. To prove this, we check directly that T −1S−1 is the inverse of ST :
(ST )(T −1S−1) = STT −1S−1 = SIS−1 = SS−1 = I, (T −1S−1)(ST ) = T −1S−1ST = T −1IT = T −1T = I.
The order reversing of S and T in the identity (ST )−1 = T −1S−1 has the following analogy attributed to a famous mathematician named Herman Weyl: we put on socks first before putting on shoes; but taking them off, we remove shoes first.
11 EXERCISE SET I.1.
Review Questions: Do I grasp the basic concepts of vector spaces and linear mappings? Am I able to write down properly the formal definitions of linear mappings, linear operators and linear functionals (as a professional mathematician does, at which I would not be embarrassed if it were in print)? What are the major examples of vector spaces and linear maps described in this section? Do I recognize the following symbols and understand perfectly well what they stand for?
3 2 5 R , C , F , M , , P , f + g, 2f 3g. 4 5 3 − (V,W ), S + T , ST , ST TS, (V ), V ′ L − L Why do we need the abstract conceptual framework of vector spaces and linear mappings? What would we miss if we restrict ourselves to the standard spaces Rn and Cn instead of working with this abstract notion?
Drills
1. Write down the general form of a vector in each of the following vector spaces: (a) F4 (F = R or C), (b) P , (c) P , (d) R3 P . 2 4 × 1 2. Find u + ( 2)v, where u, v V , in each of the following cases: − ∈ (a) V = C2, u = (2 + 3i, 3 2i) and v = (1 + i, 1 i). − − (b) V = P ; u and v are polynomials 1 2x + x2 and 1 + x x2 respectively. 2 − − (c) V = (R, C); u and v are functions cos x + i sin x and cos x i sin x respectively. F − 3. True or false: (a) A set 0 consisting of a single element 0 with addition and scalar multiplication { } defined in the following way is a vector space: 0 + 0 = 0, a.0 = 0. (b) All polynomials of the form a+(1+a)x+bx2 form a vector space under the usual addition and scalar multiplication.
(c) P2 is a subspace of P3. (d) If U is a subspace of V and if V is a subspace of U, then U = V . (e) If U is a subspace of V and if W is a subspace of U, then W is a subspace of V . 4. Let p(x) = x2 x + 2 and q(x) = 2x2 + 1. Find: p(x + 1), p(1), p(q(x)), xp(x), p(x)2, − p(x2), (p + q)(x), q(x + p(x)). 5. In each of the following cases, is the given transformation from one vector space into another linear? Why? (Questions like this should be answered in the following way:
12 if your answer is “Yes”, then you have to prove it. If your anwer is “No”, you should disprove it by pointing out one instance where the linearity fails; (only one is needed — don’t waste your time to find or write down another.) (a) T : given by T (p(x)) = p(x2). P → P (b) T : given by T (p(x)) = p(x)2. P → P (c) T : R given by T (p(x)) = p(1); (here, of course, p(1) stands for the value of P → p at 1.) (d) T : R given by T (p(x)) = p(1) + 1. P → (e) The map T : M , M , (Reminder: Mm,n stands for the vector space of all 2 2 → 3 3 m n matrices) given by T (X) = AXB, where A and B are fixed matrices of × sizes 3 2 and 2 3 respectively. × × (f) The map T : M , M , given by T (X) = X + A, where A is a fixed 2 2 2 2 → 2 2 × matrix. ( Caution: Be careful about the way you put down your answer.) ♠ 2 (g) T : M , M , given by T (X) = X . 2 2 → 2 2 (h) V is a vector space and v is a fixed vector in V ; T : (V ) V is given by L → T (X) = X(v); (here, of course X (V ), i.e. X is an operator on V , and X(v) ∈ L is the vector in V obtained by applying X to v.)
Exercises
1. Let R, S, and T be linear operators on a vector space V . Simplify the following expressions: (a) R(S + T ) S(T + R) (R S)T . − − − (b) S(S−1T + TS−1) + (S−1T + TS−1)S + S−1(ST TS) (ST TS)S−1. (S is − − − assumed to be invertible.) (c) [[R, S],T ] + [[S, T ],R] + [[T,R],S]; (for linear operators P , Q on V , their commu- tator [P,Q] is defined to be the linear operator PQ QP .) − 2. On the linear space of all polynomials, define the operators M, D and U by putting P M(p(x)) = xp(x), D(p(x)) = p′(x) (the derivative of p(x)), U(p(x)) = p(x + 1). Compute (a) [M,D] (= MD DM). − (b) UMU −1 M; (notice that U −1(p(x)) = p(x 1)). − − 3. Let S and T be two linear operators satisfying the relation STS = S. (In this situation
we may call T a generalized inverse of S.) Let P = ST , Q = TS and T0 = QT P.
13 2 2 Verify that (a) P = P, (b) Q = Q, (c) PS = S, (d) SQ = S, (e) ST0S = S, (f) T0 = TST and (g) T0ST0 = T0. 4. In each of the following cases, find S2, T 2 ST and TS for S,T (V ). ∈ L 2 1 (a) V = R ; S((x1, x2)) = (x2, x1), T ((x1, x2)) = 2 (x1 + x2, x1 + x2). ′ (b) V = P1; S(p(x)) = p (x), T (p(x)) = p(0).
(c) V = M , (the space of all 2 2 matrices); S(X) = AX, T (X) = XB, where 2 2 × 0 1 0 0 A = ,B = . 0 0 0 1
Problems
1*. Prove that, for linear operators S, T on a vector space V , if S, T and S + T are invertible, then S−1 + T −1 is also invertible. 2*. (This is a very hard problem) Let S and T be linear operators on a vector space V . Prove that I ST is invertible if and only if I TS is invertible. − −
14 2. Linear Transformations and Matrices §
2.1. The main goal of the present section is to study the relation between linear mappings and matrices. We show that every m n A matrix naturally gives rise to a n m × linear mapping MA : F F and vice versa. Then we show that, given coordinate → systems in vector spaces V and W, every linear mapping T : V W can be represented by → a matrix. The device set up here tells us that matrices can be regarded as linear mappings and problems about linear mappings can often be reduced to problems in matrices.
To understand linear transformations better, let us look at the simplest situation in which both the domain and the range are the one dimensional space F1 F; here F is ≡ either R or C. Let T : F F be linear. Then, for every x F, T x = T (x 1) = xT 1. → ∈ Certainly T 1 F, i.e. T 1 is just a scalar. Why don’t we give it a name—let’s call it ∈ a. Thus T x = ax. On the other hand, it is easy to check that a transformation T from F to F of the form T x = ax is linear, where a is a fixed scalar. We have shown that linear transformations from F to F are exactly those of the form T (x) = ax. This gives a thorough description of such transformations. Nothing more we can say!
Once we succeed in tackling a special case, we should move to a more interesting general situation in which the domain of a given linear transformation is Fn and its range is in Fm. The problem now is to describe all such transformations. As you will see, the answer turns out to be similar to the one for our special case: the real number a is replaced by an m n matrix, say A, and x is replaced by an n 1 column vector x and × × the “outcome” T x is the m 1 column vector Ax. Thus linear transformations from Fn to × Fm are exactly those T which can be put in the form T x = Ax, where A is a fixed m n × matrix.
2.2. But the above answer seems to have a serious conflict with the previous conven n tion: a vector in F is used to be a row, i.e. something like x = (x1, x2, . . . , xn) instead of a column: x1 x2 x = . . (2.2, 1). . xn The rationale of our previous convention is clear: the column in (2.2.1) is awkward to write. Worse: its outstanding look draws unwarranted attention! This prompts us to adopt the following rule: Something in a row surrounded by the round brackets “(” and “)” is the same thing as the transpose of this row (which becomes a column) surrounded by the square
15 brackets “[” and “]”, such as
cow pig (cow, pig, dog, cat) = . dog cat In order to work under this rule, we have to be very careful about the brackets. This rule works well, but is too “brutal”. We try to avoid applying it directly as much as possible. Certainly we may regard the column (2.2.1) as the transpose of a row and put ⊤ ⊤ x = [x x xn] or x = [x , x , . . . , xn] with commas for clarity, and sometimes 1 2 1 2 we shall do so in the future. But this still looks rather clumsy.
2.3. We proclaim:
Every matrix associates with a God-given linear transformation.
In details, given a m n matrix A with entries in the field F, we define a linear transfor n × m mation MA from F to F simply by putting
MAx = Ax (M)
⊤ n for all x = [x , x , . . . , xn] F . The transformation MA defined in this way may be 1 2 ∈ called the multiplication by A. (That is why we choose the symbol MA to denote it.) The linearity of MA follows immediately from some well known properties of matrix algebra: for all u and v in Fn and a, b F, we have ∈
MA(au + bv) = A(au + bv) = aAu + bAv = aMAu + bMAv.
2 3 To give a quick example, suppose A = . Then 4 5 x 2 3 x 2x + 3x M x = M 1 = 1 = 1 2 . A A x 4 5 x 4x + 5x 2 2 1 2 2 Hence MA is a linear operator on R sending (x1, x2) to (2x1 + 3x2, 4x1 + 5x2).
The converse of the above proclamation is also true:
n m Theorem 2.3.1. Every linear transformation T from F to F is of the form MA for some m n matrix A, i.e. there exists an m n matrix A such that T x = Ax for all × × x Fn. Furthermore, the matrix A here is uniquely determined by T . ∈ 16 This theorem tells us that there is a one to one correspondence between the set n m n m (F , F ) of all linear transformations from F to F and the set Mm,n(F) of all m n L × matrices with entries in F:
n m A Mm,n(F) T = MA (F , F ). ∈ ←→ ∈ L
Before we embark on the proof of this theorem, let us take a look at the linear trans formations T : Rn R and C : R3 R3 of Examples 1.6.2 and 1.6.3 in the previous a → a → section, defined by T (x) = a x and C (x) = a x respectively. According to the a a × above theorem, there are matrices A and B of sizes 1 n and 3 3 respectively such that × × Tax = Ax and Cax = Bx. What are A and B? Well, let us write
x1 x2 Tax = a1x1 + a2x2 + anxn = [a1 a2 an] . . . xn a x a x 0 a a x 2 3 − 3 2 − 3 2 1 Cax = a3x1 a1x3 = a3 0 a1 x2 . a x − a x a a −0 x 1 2 − 2 1 − 2 1 3
Therefore A = [a a an], while B is the skew symmetric matrix given as 1 2 0 a a − 3 2 B = a3 0 a1 . a a −0 − 2 1 Notice that B⊤ = B; (here B⊤ is the transpose of B). −
2.4. Now we turn to the proof of Theorem 2.3.1 above. There are two parts in the conclusion of the theorem: first, there is a matrix A with the property that T x = Ax, and second, such a matrix is completely determined by T . The first part concerns the existence of A and the second part its uniqueness. To prove the existence part, we have to find A (which seems to be hiding somewhere.) To prove the uniqueness part, it is enough to find out in what way does T determines A. Normally we prove the existence part first, because normally we think: if it didn’t exist, what would be the point of proving (or even talking about) its uniqueness? However, logically they are independent entities and it doesn’t matter which comes first. Here we prove the uniqueness part first—this strategy of proof is at odds with our normal thinking. There are two reasons for taking this strategy. First, the uniqueness part is easier. Second, the proof of uniqueness part actually points out a way to find A. It helps us to figure out how to prove the existence part! (You may not be
17 used to this unusual way of thinking. But people usually become smart when they begin to think in an unusual way.)
Proof of Theorem 2.3.1. Suppose that T x = Ax and we have to prove that A is
uniquely determined by T . Let A1,A2,...,An be columns of A so that
A = [A A An]. 1 2 ⊤ n It is easy to check that, for x = (x , x , , xn) [x x xn] R , T (x) is given by 1 2 ≡ 1 2 ∈ x1 x2 Ax = [A1 A2 An] . = A1x1 + A2x2 + + Anxn. (2.4.1) . xn Let us find the result of T acting on the standard basis vectors:
e1 = (1, 0,..., 0), e2 = (0, 1,..., 0),..., en = (0, 0,..., 1).
Putting x = e1 = (1, 0, 0,..., 0) in (2.4.1), i.e. x1 = 1, x2 = 0, x3 = 0 etc., we obtain T e1 = A1. In the same way we can obtain T e2 = A2 etc. Now T completely determines
T e1 = A1,T e2 = A2,...,T en = An which in turn determines the matrix A = [A A An]. 1 2 Next we prove the “existence part”. Let T : Fm Fn be a linear transformation. → We have to find a matrix A such that T x = Ax for all x Fn. Let A be the m n matrix ∈ × with T ek as its kth column (1 k n). In other words, A = [A A An], where ≤ ≤ 1 2 Ak = T ek for 1 k n. We have to check T x = Ax in order to show that A obtained in ≤ ≤ this way will do the job. Let us “recycle” the computation given in (2.4.1) and write
Ax = A x + A x + + Anxn := x A + x A + + xnAn, 1 1 2 2 1 1 2 2
(x1A1 is the usual way for a scalar multiple of a column vector, and A1x1 is the correct way x when x is regarded as a 1 1 matrix) from which it follows that 1 ×
Ax = x T e + x T e + + xnT en 1 1 2 2 = T (x e + x e + + xnen) = T x. 1 1 2 2 Here we have used the linearity of T and the following elementary manipulation
x e + x e + + xnen = x (1, 0,..., 0) + x (0, 1,..., 0) + + xn(0, 0,..., 1) 1 1 2 2 1 2 = (x1, x2, . . . , xn) = x.
18 Hence T = MA. The proof of Theorem A is complete. From the proof here, we see that
Fact. The columns of an m n matrix A are MAe ,MAe ,...,MAen, where × 1 2 e1, e2,..., en are standard basis vectors.
2.5. Now we use the fact stated at the end of the last subsection to decide the matrix of some linear operators on vector spaces of the form Fn.
Example 2.5.1. Permutation matrices
Let π be a permutation of the set [n] 1, 2, . . . , n ; in other words, π is a bijection of n n ≡ { } [n]. Consider the map Tπ : F F given by →
Tπ(x1, x2, . . . , xn) = (xπ(1), xπ(2), . . . , xπ(n)).
Then it is routine to check that Tπ is linear. Hence, by Theorem A, there is a unique n n × matrix P such that Tπx = P x, called the permutation matrix associated with π. Now we want to give a specific description of P . According to the fact stated above, the first column of P is given by Tπe1. For x = e1, we have x1 = 1, x2 = 0, x3 = 0 etc. Its image is Tπx = (xπ(1), xπ(2), . . . , xπ(n)), where the kth entry xπ(k) is 1 precisely when π(k) = 1, −1 −1 or k = π (1). Thus, except the π (1) entry, which is 1, all of the other entries of Tπe1
e e 1 e e 1 e e 1 is 0. Hence Tπ 1 = π− (1). In the same way we have Tπ 2 = π− (2), Tπ 3 = π− (3) etc. So we have
P = [e 1 e 1 e 1 ]. π− (1) π− (2) π− (n) To give a specific example, let n = 3 and let π be given by π(1) = 2, π(2) = 3 and π(3) = 1. −1 −1 −1 Then π (1) = 3, π (2) = 1, and π (3) = 2. Thus T (x1, x2, x3) = (x2, x3, x1) and the permutation matrix for Tπ is
0 1 0 e 1 e 1 e 1 e e e P = [ π− (1) π− (2) π− (3)] = [ 3 1 2] = 0 0 1 . 1 0 0 Notice that each row and each column has exactly one entry equal to 1 and the rest entries are zeros. (Remark: Permutation matrices play an important role in doubly stochastic matrices, which are useful for establishing some matrix inequalities. It turns out that all doubly stochastic matrices form a convex set and the permutation matrices turn out to be exactly the so called “extreme points” of this convex set.)
Example 2.5.2. Rotations
2 For a fixed real number θ, let T Tθ be the operator on R sending an arbitrary vector ≡ 19 v to Tθv is obtained by turning v through the angle θ in the anticlockwise direction:
2 It is not too hard to see that Tθ is a linear operator on R . The question is: what is the 2 2 matrix inducing this linear operator? Let × a b A = . c d
2 ⊤ be the matrix inducing T Tθ: T v = Av for all v R . Letting e = (1, 0) (= [1, 0] ) ≡ ∈ 1 and e2 = (0, 1), we have T e1 = (a, c) and T e2 = (b, d). On the other hand, from the following figure, we see that T e = (cos θ, sin θ),T e = ( sin θ, cos θ): 1 2 −
We conclude that the matrix which induces Tθ is
cos θ sin θ Aθ − . (2.5.1) ≡ sin θ cos θ It is a priori clear that the result of rotating a vector by an angle α, followed by a rotation through an angle β, is the same as a single rotation with angle α + β. Putting this in mathematical symbols, we have TαTβ = Tα+ β. Those matrices inducing operators in this identity have the same relation: Aα+ β = AαAβ. From this relation and (2.5.1) above, it follows immediately that
cos(α + β) = cos α cos β sin α sin β, − sin(α + β) = cos α sin β + sin α cos β, which are well known (but by no means obvious) identities in trigonometry.
20 2.6. In this section we discuss a book keeping device to label vectors by columns of numbers and to represent linear transformations by matrices, relative to some coordinate system, or more precisely, a basis. By means of such a device, a problem about vectors or linear transformations is converted to the corresponding problem about columns and matrices, which is in general easier to manipulate. Bases in linear algebra serve the same purpose as frames of reference in physics.
Definition 2.6.1. An ordered set of vectors b1, b2,..., bn in a vector space V is called a basis if each vector v in V can be written in a unique way as
v = v1b1 + v2b2 + ... + vnbn (2.6.1)
for some scalars v1, v2, . . . , vn. (Remark: Notice that there are two ingredients in this definition: the possibility to write (2.6.1) and the uniqueness of such an expression.)
With a fixed basis = b , b ,..., bn in a vector space, each vector v in V determines (in B { 1 2 } a unique manner) the scalars v1, v2, . . . , vn, via (2.6.1) above. These scalars are called the coordinates of v relative to the basis . We will arrange them into a matrix with a single B column, denoted by [v]B, and call it the column representation or the coordinate vector of v relative to : B ⊤ [v]B = (v , v , . . . , vn) [v v . . . vn] . 1 2 ≡ 1 2 The coordinates (or the column representation) of a vector depend on our choice of basis at the outset. The subscript in [v]B emphasizes this dependence. For convenience B sometimes we drop this subscript and write [v] if our choice of basis is understood.
Example 2.6.2. Standard examples of bases:
1. The standard basis for Fn. The vectors
e1 = (1, 0, 0, 0,..., 0), e2 = (0, 1, 0, 0,..., 0),..., en = (0, 0, 0, 0,..., 1)
form a basis of Fn, called the standard basis (or the natural basis, or the usual basis) n of F . The kth vector ek in this basis, has 1 in the kth entry and 0 elsewhere. With n respect to this basis, the column representation of a vector v = (v1, v2, . . . , vn) in F is ⊤ [v] = [v1, v2, . . . , vn] ; (convince yourself this is the case.)
2. The standard basis for Pn. The monomials
1, x, x2, . . . , xn
21 form a basis of Pn. With respect to this basis, the column representing a polynomial 2 n ⊤ p(x) = a + a x + a x + + anx in Pn is given by [ p ] = [a , a , a , . . . , an] . 0 1 2 0 1 2 Example 2.6.3. (a) Find the column representation of (1, 3) in R2 relative to the basis = (1, 1), (1, 1) . (b) Find the column representation of x3 relative to the basis B { − } = 1, x 1, (x 1)2, (x 1)3 in P . E { − − − } 3 ⊤ Solution. (a) Suppose [(1, 3)]B = [a, b] . Then we have (1, 3) = a(1, 1) + b(1, 1), − ⊤ which gives 1 = a+b, 3 = a b. Thus a = 2, b = 1. So the answer is [(1, 3)]B = [2, 1] . 3 − − − (b) Let [x ]E = (a0, a1, a2, a3). Then
x3 = a + a (x 1) + a (x 1)2 + a (x 1)3. 0 1 − 2 − 3 −
We have to find a0 to a3. There are several ways to do so. Here is one. Let y = x 1. 3 2 3 3 2 − 3 Then x = y + 1 and (y + 1) = a0 + a1y + a2y + a3y . Now (y + 1) = 1 + 3y + 3y + y . 2 3 2 3 Thus 1 + 3y + 3y + y = a0 + a1y + a2y + a3y . So, by comparing coefficients of powers 3 of y, we have a0 = 1, a1 = 3, a2 = 3, a3 = 1. Thus [x ]E = (1, 3, 3, 1), which is our answer.
Once we have a basis = (b , b ,..., bn) in a vector space V over the field F, we B 1 2 can define a linear mapping T from V to Fn (or simply write T : V Fn) by putting → T v = [v]B. The linearity of T means that the identity
[au + bv] = a[u] + b[v] (2.6.2) holds for all vectors u, v in V and all scalars a, b in F; (for simplicity, we write [v] for [v]B). To see this, write [u] = (u1, u2, . . . , un) and [v] = (v1, v2, . . . , vn). Then u = u b + u b + + unbn and v = v b + v b + + vnbn. Hence 1 1 2 2 1 1 2 2
au + bv = a(u b + u b + + unbn) + b(v b + v b + + vnbn) 1 1 2 2 1 1 2 2 = (au + bv )b + (au + bv )b + + (aun + bvn)bn. 1 1 1 2 2 2 and hence
[au + bv] = (au1 + bv1, au2 + bv2, . . . , aun + bvn)
= a(u1, u2, . . . un) + b(v1, v2, . . . vn) = a[u] + b[v].
−1 n Notice that T is invertible. Its inverse T simply sends any (u1, u2, . . . , un) in F to u = u b + u b + + unbn in V . 1 1 2 2
2.7. We have seen that, by introducing a basis to a vector space, we can “label” a vector in this space by a bunch of numbers arranged in a column, giving us the column
22 representation of this vector. Next we explain how to “label” a linear mapping by a bunch of numbers arranged into a rectangular array — of course here I mean a matrix. Let us start with a linear transformation T from a vector space V to a vector space W . Suppose that = v , v ,..., vn is a basis of V and = w , w ,..., wm is a basis of W . We V { 1 2 } W { 1 2 } can use a matrix to represent T . This matrix is called the representing matrix of T , or the matrix representing T , or just the matrix of T , relative to the bases and , V W and is denoted by V [T ]W , or simply [T ].
(The subscript and the superscript in [T ]V emphasize the dependence of this matrix W V W on these two bases and, when their presence is understood, we simply write [T ] for this matrix) Matrix [T ] is constructed column by column in the following way. To find its first column, we apply T to the first basis vector v in to get T v . As T v is a vector in W , 1 V 1 1 we can express it in a unique way as a linear combination of basis vectors in , say W
T v = t w + t w + + tm wm. 1 11 1 21 2 1
The coefficients of this linear combination, namely, t11, t21 etc. will fill up the first column
of [T ]. To find the second column of [T ], we apply T to v2 to get T v2 in W and express it as a linear combination of vectors in , say W
T v = t w + t w + + tm wm. 2 12 1 22 2 2 Then fill up the second column by coefficients of this linear combination. The other columns of [T ] are obtained in the similar fashion. Thus, we come up with
t t t n 11 12 1 t21 t22 t2n [T ] = . . . , . . . tm1 tm2 tmn where the jth column of [T] (consisting of t1j, t2j, . . . , tmj) comes from
T vj = t jw + t jw + + tmjwm. 1 1 2 2
In other words, the jth column is just [T vj]W , the column representing T vj relative to the basis in W. In case V = W and = , T is a linear operator on V and the matrix VW V W V [T ] [T ] is a square matrix. In this case we will write [T ]V instead of [T ] . ≡ V V
Example 2.7.1. (a) Let T be the operator on V = P2 sending p(x) to p(x + 1). Find the matrix [T ] of this operator relative to the standard basis = 1, x, x2 . (b) Find the B { } 23 ′ matrix of differential operator D defined on P2 by D(p) = p (the derivative of p), relative to the standard basis . (c) Consider the linear map M from P to P defined by the B 1 2 recipe M(p(x)) = xp(x). In P we take the standard basis = 1, x but in P we take 1 B { } 2 the basis = 1, 1 + x, (1 + x)2 . Find the matrix representation relative to these bases. C { } Solution: (a) and (b) Since T (1) = 1, T (x) = x + 1 = 1 + x, T (x2) = (x + 1)2 = 1 + 2x + x2 and D(1) = 0, D(x) = 1, D(x2) = 2x, we have
1 1 1 0 1 0 [T ]B = 0 1 2 , [D]B = 0 0 2 . 0 0 1 0 0 0 2 B (c) It is easy to get M(1) = x, M(x) = x . But, in order to get the matrix [M]C , we have to write M(1) = x and M(x) = x2 as linear combinations of 1, 1 + x, (1 + x)2,
in other words, we have to find out a0, a1, a2 and b0, b1, b2 in the following identities: x = a +a (1+x)+a (1+x)2, x2 = b +b (1+x)+b (1+x)2. Let s = 1+x so that x = s 1. 0 1 2 0 1 2 − Then x = 1 + s = 1 + (1 + x) and x2 = (s 1)2 = 1 2s + s2 = 1 2(1 + x) + (1 + x)2. − − − − − The matrix [M] [M]B can be read off from these identities: ≡ C 1 1 [M] = −1 2 . 0− 1
Relative to a basis of V , for S, T (V ), a, b F and v V , we have ∈ L ∈ ∈ [aS + bT ] = a[S] + b[T ], [ST ] = [S][T ], [Sv] = [S][v]. (2.7.1)
This tells us that the representation by matrices captures the essence of operators. But we defer the proof of this fact to Chapter 3 when the summation symbol will be systematically used.
24 EXERCISE SET I.2.
Review Questions: What is the meaning of “the linear transformation induced by a matrix”? What is the basic fact concerning such induced transformations? How to find the matrix which induces a given linear transformation (in principle)? Do I understand each of the following concepts?
basis, column representation of vectors, matrix representation of operators
Can I describe the procedure of finding column representation of vectors and matrix rep resentation of linear transformations to any first year student?
Drills
1. Write down the matrices which induce the following operators on F3: (a) T , sending (x , x , x ) to (x + x x , x x , x + x ). 1 2 3 1 2 − 3 1 − 2 2 3 (b) D, sending (x , x , x ) to ( x , 2x , x ). 1 2 3 − 1 2 3 (c) R, sending (x , x , x ) to (x cos θ + x sin θ, x , x sin θ + x cos θ). 1 2 3 1 3 2 − 1 3 (d) (with F = R) Q, sending (x , x , x ) to (1, 2, 3) (x , x , x ). 1 2 3 × 1 2 3 2. Find the 2 2 real matrix A such that its induced operator MA projects each vector × v in R2 orthogonally to the line passing through the origin and (1, 1), as indicated in the left figure below:
o 3 3. Find the 3 3 matrix A such that MA is the 120 rotation in R about the axis × through the origin and the point (1, 1, 1), indicated in the right figure above.
4. In each of the following cases, write down the matrix A of the operator T MA on ≡ R2 satisfying the given conditions: 2 (a) T e1 = (1, 1) and T = O. 2 (b) T e1 = (2, 2) and T = T . 2 (c) T e1 = (cos θ, sin θ) and T = I, where the given angle θ satisfies 0 < θ < π/2.
25 6. Give a basis for each of the following vector spaces: 2 3 (a) P , (b) R , (c) C , (d) P P , (e) M , . 3 1 × 2 2 2 7. In each of the following cases, write down a basis for the kernel ker(T ) and the range T (V ) of the given linear transformation T : (a) T : P P , T p = p′, the derivative of p. 2 → 2 (b) T : P P , T (p(x)) = xp(x). 2 → 3 2 2 1 1 (c) T : F F , T = MA, where A = . → 1 1 (d) T : R2 P , T ([a, b]⊤) = a + bx2. → 2 1 0 (e) T : M , M , , T (X) = BX, where B = . 2 2 → 2 2 1 0 (f) T : M , M , , T (X) = XB, where B is the same as the one in (e) above. 2 2 → 2 2 (g) T : M , M , , T (X) = BX XB, where B is the same as the one in (e). 2 2 → 2 2 − 8. In each of the following cases, find the column representation of the vector v relative to the given basis in V . (The justification of to be a basis is not required.) B B (a) V = R2, v = (2, 3), = (1, 0), (1, 1) . B { } (b) V = C2, v = (2 + 2i, 0), = b , b with b = (1 + i, 1 i), b = (1 i, 1 + i). B { 1 2} 1 − 2 − (c) V = P , v is the polynomial 2+3x and = 1 + x, 1 x . 1 B { − } (d) V = P , v is 1 + x + x2, = 1 + x, 1 + x2, x + x2 . 2 B { } 5 2 1 0 1 0 0 1 0 0 (e) V = M , , v is and = , , , . 2 2 0 1 B 0 1 0 1 0 0 1 0 − (d) V , W , and are the same as those in (c), T (p(x)) = p(x2). B C 1 0 (e) V = W = M , , T (X) = AX XA with A = , and are the standard 2 2 − 1 0 B C basis of M2,2:
1 0 0 1 0 0 0 0 = = , , , . B C 0 0 0 0 1 0 0 1
Exercises
1. Use Theorem 2.3.1 to describe (a) the correspondence between the vector spaces n n (F , F) and M ,n, and (b) the correspondence between the vector spaces (F, F ) L 1 L and Mn,1.
26 2*. Show that a 2 2 matrix A induces a rotation of R2 if and only if it is of the form × a b A = − with det(A) a2 + b2 = 1. b a ≡ Check that A−1 = A⊤. 3*. According to special relativity, a boost is a linear operator on R2 induced by a matrix of the form a b a b with a2 b2 = 1 and a > 0. b a b a ≡ − Show that the product of two boosts is a boost. 4*. Let v = (a, b) be a fixed nonzero vector in R2. Find the matrix A which induces the projection P onto the one dimensional subspace spanned by v, as indicated in the left figure below:
2 5*. Let Lα be the line through the origin in the plane R , so that the angle between Lα and the horizontal axis is α. (a) Find the 2 2 matrix which induces the mirror × reflection Tα about Lα indicated in the right figure above. (b) Show that the product TαTβ of two such reflections is a rotation Rθ. Find the angle θ of rotation in terms of α and β.
3 2 2 2 2 6*. Let n = (n1, n2, n3) be a unit vector in R ( n n1 + n2 + n3 = 1) and let H be 3 | | ≡ the plane in R through the origin and perpendicular to n. Let Tn be the mirror reflection with respect to H, indicated in the following figure:
(a) Show that T v = v 2(v n)n for all v R3. (b) Write down the matrix which n − ∈ induces this reflection. (c) Suppose that m = (m1, m2, m3) is another unit vector and
27 T is the reflection defined in the similar fashion. Show that the vector w n m m ≡ × is invariant for Tm Tn, that is Tm Tnw = w. 7*. Find the 3 3 matrix A which induces the 60o rotation T in R3 about the axis through × the origin and the point (1, 1, 1). Compute A2. Explain why your answer for A2 is the expected one.
28 3. Linear Equations §
3.1. Let V and W be vector spaces over the field F of scalars and let T : V W → be a linear mapping. Let b be a vector in W . Consider the equation
T x = b (3.1.1)
We say that a vector v in V is a solution to this equation if T v = b.
Example 3.1.1. System of linear equations Let A be an m n matrix over F and let b be a vector in Fm, say × a11 a12 a1n b1 a21 a22 a2n b2 . . . . A = . . . , b = . . am1 am2 amn bm n m Let T = MA, that is, the linear map T : F F given by T x = Ax. Then the → equation (3.1.1), that is, T x = b, is the following system of linear equations
a x + a x + + a nxn = b 11 1 12 2 1 1 a x + a x + + a nxn = b 21 1 22 2 2 2 (3.1.2)