<<

CHAPTER IV

OPERATORS ON INNER PRODUCT SPACES

1. Complex Inner Product Spaces §

1.1. Let us recall the inner product (or the ) for the real n–dimensional n n Euclidean R : for vectors x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in R , the inner product x, y (denoted by x y in some books) is defined to be x, y = x y + x y + + x y , 1 1 2 2 n n and the (or the ) x is given by

x = x, x = x2 + x2 + + x2 . 1 2 n For complex vectors, we cannot copy this definition directly. We need to use complex conjugation to modify this definition in such a way that x, x 0 so that the definition ≥ of magnitude x = x, x still makes sense. Recall that the conjugate of a complex number z = a + ib, where x and y are real, is given by z = a ib, and − z z = (a ib)(a + ib) = a2 + b2 = z 2. − | | The identity z z = z 2 turns out to be very useful and should be kept in mind. | | Recall that the addition and the multiplication of vectors in Cn are defined n as follows: for x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in C , and a in C,

x + y = (x1 + y1, x2 + y2, . . . , xn + yn) and ax = (ax1, ax2, . . . , axn)

The inner product (or the scalar product) x, y of vectors x and y is defined by x, y = x y + x y + + x y (1.1.1) 1 1 2 2 n n Notice that x, x = x x +x x + +x x = x 2+ x 2+ + x 2 0, which is what 1 1 2 2 n n | 1| | 2| | n| ≥ we ask for. The norm of x is given by

x = x, x 1/2 = x 2 + x 2 + + x 2. | 1| | 2| | n| 1 Remark: In (1.1.1), it is not clear why we prefer to take complex conjugates of components of y instead of components of x. Actually this is more or less due to the tradition of mathematics, rather than our preference. (Physicists have a different tradition!)

The space Cn provides us with the typical example of complex inner product spaces, defined as follows:

Definition. By an inner product on a complex we mean a device of assigning to each pair of vectors x and y a denoted by x, y , such that the following conditions are satisfied: (C1) x, y 0, and x, x = 0 if and only if x = 0. ≥ (C2) y, x = x, y . (C3) The inner product is a “sesquelinear map”, i.e.

a x + a x , y = a x , y + a x , y 1 1 2 2 1 1 2 2 x, b y + b y = b x, y + b x, y . 1 1 2 2 1 1 2 2 (Actually the second identity of (C3) above is the consequence of the first, to gether with (C2).

Inner products for real vector spaces can be defined in the similar fashion. It is slightly simpler because there is no need to take complex conjugation. This is simply because the conjugate of a is just itself.

Besides Cn, another example of complex is given as follows. Consider a space of wellbehaved complex-valued functions over an , say [a, b]; F (here we do not specify the technical meaning of being wellbehaved). The inner product f, g of f, g is given by ∈ F 1 b f, g = f(t) g(t) dt, for f, g . b a ∈ F − a (On the right hand side, 1/(b a) is a normalization factor added for convenience in the − future.) The norm induced by this inner product is

1/2 1 b f f, f 1/2 = f(t) 2 dt for f . ≡ b a | | ∈ F − a In the future we will take to be the space of trigonometric and [a, b] is F any interval of length 2π, such as [0, 2π] and [ π, π]. − 2 1.2. Let V be a complex vector space V with an inner product , . We say that two vectors x and y in V are orthogonal or if their inner product is zero and we write x y in this case. Thus, by our definition here, ⊥ x y x, y = 0. ⊥ ⇐⇒ From the definition of you should recognize that, first, the zero vector 0 is orthogonal to every vector (indeed, for each vector x in V , 0, x = 0 + 0, x = 0, x + 0, x by (C3) and hence 0, x = 0); second, 0 is the only vector orthogonal to itself (this follows from (C1)) and hence 0 is the only vector orthogonal to every vector; third, x y implies y x (indeed, if x, y = 0, then y, x = x, y = 0 = 0). ⊥ ⊥ A of nonzero vectors is called an orthogonal system if each vector in is S S orthogonal to all other vectors in . If, furthermore, each vector in has length 1, S S then is called an orthonormal system. (Notice the difference of the endings of the S words “orthogonal” and “orthonormal”.) We have the following generalized Pythagoras theorem: If v , v , , v are an orthogonal system, then 1 2 n v + v + + v 2 = v 2 + v 2 + + v 2. (1.2.1) 1 2 n 1 2 n We prove this by induction on n. When n = 1, (5.2) becomes v 2 = v 2 and there is 1 1 nothing to prove. So let n 2 and assume that the theorem is true for n 1 vectors. Let ≥ − w = v + v + + v . Then, by our induction hypothesis, w 2 = n v 2. Thus 2 3 n k= 2 k (1.2.1) becomes v + w 2 = v 2 + w 2 which remains to be verified. Notice that 1 1 v , w = v , v + v , v + + v , v = 0. 1 1 2 1 3 1 n Hence v + w 2 = v + w, v + w 1 1 1 = v , v + v , w + w, v + w, w 1 1 1 1 = v , v + v , w + v , w + w, w 1 1 1 1 = v , v + w, w = v 2 + w 2. 1 1 1 Hence (1.2.1) is valid. Given an orthogonal system = e , e ,..., e in V , and a vector v which can E { 1 2 n} be written as a of vectors in , say B n v = v1e1 + v2e2 + + vnen vkek, ≡ k= 1 we look for an explicit for the coefficients vk in this linear combination. By the in the “first slot” of the inner product, we have n n v, ej = vkek, ej = vk ek, ej . k= 1 k= 1 3 Note that e , e are zeros except when k = j, which gives 1 in this case; (in short, k j e , e = δ ). So the above identity becomes v, e = v . Thus k j jk j j n v = v, ek ek = v, e1 e1 + v, e2 e2 + + v, en en, (1.2.2) k= 1 Since v, e e = v, e e = v, e , the generalized Pythagoras theorem gives k k | k| k | k| v 2 = v, e 2 + v, e 2 + + v, e 2, (1.2.3) | 1| | 2| | n| if v is in the of the orthonormal system = e , e ,..., e . The last E { 1 2 n} identity is a general fact about orthonormal system that should be kept in mind.

1.3. Next we consider a slightly more general problem: given a vector v in an inner product space V and a subspace W of V , spanned by a given orthogonal system S = w , w ,..., w of nonzero vectors ( w , w = 0 for k = j and w , w = 0, where k { 1 2 r} k j k k and j run between 1 and r), find the socalled orthogonal decomposition of v:

v = w + h, (1.3.1)

where w W and h W (that is, h is perpendicular to all vectors in W ). The vector w ∈ ⊥ here will be called the (orthogonal) projection of v onto W . Since w is in W and W is

spanned by w1, w2,..., wr, we can write

w = a w + a w + + a w . (1.3.2) 1 1 2 2 r r

We have to find a1, a2,..., ar. Identity (1.3.1) can be rewritten as

r v = w + h = akwk + h. k= 1 Take any vector from w1, w2,..., wr, say wj, and form the inner product of wj with each side of the above identity. By the linearity of the “first slot” of inner product, we have v, w = r a w , w + h, w . Note that w , w are zeros except when k = j. j k= 1 k k j j k j Hence r a w , w can be reduced to a w , w . On the other hand, h, w = 0 k= 1 k k j j j j j because h is perpendicular to W and w is in W . Thus we arrive at v, w = a w , w , j j j j j or a = v, w / w , w . Substitute this expression of a to (1.3.2), switching the index j j j j j j to k, to obtain:

r v, wk v, w1 v, w2 v, wr w = wk w1 + w2 + + wr, (1.3.3) k= 1 wk, wk ≡ w1, w1 w2, w2 wr, wr which is the required projection.

4 Now we consider two special cases: The first case is that S consists of a single (nonzero) vector, say u. Write down the orthogonal decomposition

v, u v = u + h where h u. u, u ⊥ 2 The generalized Pythagoras theorem gives v 2 = v, u / u, u u 2 + h 2. Using u, u = u 2, we rewrite the first term on the righthand side as v, u 2/ u 2. Then | | we show our generosity by dropping the second term h 2 on the right to obtain the v 2 v, u 2/ u 2. We can rearrange this into ≥ | | v, u v u , (1.3.4) | | ≤ which is the celebrated Cauchy–Schwarz’s inequality.

The second special case is that S consists of an orthonormal system, say S = e , e ,..., e . In this case { 1 2 n} n n 2 2 w = v, ek ek with w = v, ek . k= 1 k= 1 | | The orthogonal decomposition v = w + h tells us that v 2 = w 2 + h 2. Dropping h 2, we get v 2 w 2, or w 2 v 2. We have arrived at ≥ ≤ n 2 2 v, ek v . (1.3.5) k= 1 | | ≤ Notice that this inequality also holds for an infinite orthonormal system, say e ∞ . { k}k= 1 Indeed, for any n, applying this inequality to the finite system e n , we get (1.3.5) { k}k= 1 above. Letting n , we obtain → ∞

∞ 2 2 v, ek v k= 1 | | ≤ which is usually called Bessel’s inequality.

1.4. In the present section, we give some examples of orthonormal systems.

Example 1.4.1. In Cn, the standard consisting of vectors

e1 = (1, 0,..., 0, 0), e2 = (0, 1,..., 0, 0),... en = (0, 0,..., 0, 1)

clearly form an .

5 Example 1.4.2.* Fix a positive number n and let ω = e2πi/n, which is called a primitive nth . Consider the following vectors in Cn:

1 k 1 2(k 1) (n 1)(k 1) f = 1, ω − , ω − , . . . , ω − − ; 1 k n. k √n ≤ ≤ We write down the first three of them to see the general pattern:

f1 = (1, 1, 1,..., 1)/√n, 2 n 1 f2 = (1, ω, ω , . . . , ω − )/√n, 2 4 2(n 1) f3 = (1, ω , ω , . . . , ω − )/√n,

We claim that f (1 k n) form an orthonormal basis in Cn. First we check that k ≤ ≤ they are unit vectors:

2 1 2 k 1 2 2(k 1) 2 (n 1)(k 1) 2 f = 1 + ω − + ω − + + ω − − = 1 k n | | | | | | in view of ω = 1. Next we show that, for k = ℓ, fk, fℓ = 0. For definiteness, let us | | 1 assume 1 ℓ < k n. By using ω = ω− , we get ≤ ≤ k ℓ 2k 2ℓ k(n 1) ℓ(n 1) f , f = 1 + ω ω + ω ω + + ω − ω − /n k ℓ k ℓ 2k 2ℓ (k ℓ)(n 1) = 1 + ω − + ω − + + ω − − /n 2 n 1 = (1 + η + η + + η − )/n, k ℓ where η = ω − . Now

2 n 1 n (k ℓ)n n k ℓ (1 η)(1 + η + η + + η − ) = 1 η = 1 ω − = 1 (ω ) − = 1 1 = 0. − − − − − k ℓ 2 n 1 Since 0 < k ℓ < n, η ω − = 1, or 1 η = 0. Hence 1 + η + η + + η − = 0. Now − ≡ − f , f = 0 is clear. This example will be referred to in the next section when we discuss k ℓ the finite (in Example 2.7.1).

Example 1.4.3*. Consider the space of all periodic functions of period 2π. The inner product of two such functions f and g is defined to be

1 2π f, g = f(t) g(t) dt. 2π 0 We claim that the system eint ( < n < ), where n ranges over all integers, is −∞ ∞ orthonormal. First we check that each of them is of unit length:

1 2π 1 2π eint 2 == eint 2 dt = 1 dt = 1. 2π | | 2π 0 0 6 Next, for n = m, we have 2π 2π int imt 1 int imt 1 i(n m)t e , e = e e dt = e − dt 2π 0 2π 0 2π 1 i(n m)t 1 = e − = (1 1) = 0. n m 0 2π − − The orthogonal decomposition of a f in this space gives its :

2π int 1 int f(t) = c e , where c = f(t)e− dt. n n 2π

2π 2 1 2 cn f(t) dt,

Example 1.4.4*. Consider the space of all even functions of period 2π. The inner product of two such functions f and g is defined to be

1 π f, g = f(t) g(t) dt. π 0 Then the following functions

1, √2 cos t, √2 cos 2t, √2 cos 3t, . . . form an orthonormal system of this space. To show this, we need to check

2 π 1 if m = n cos mt cos nt dt = δ = (1.4.1) π mn 0 if m = n 0 which is left to the reader as an exercise.

Example 1.4.5*. In this example we introduce the socalled Chebyshev’s polynomi

als Tn(x) and Un(x), which have extensive applications in numerical analysis and some extremal problems arising from electrical engineering. First we observe, from Euler’s iden tity, cos nt + i sin nt = eint = (eit)n = (cos t + i sin t)n. (1.4.2)

7 We may try to use the binomial formula to expand the right hand side of (1.4.2) and if we are patient enough, we can see the following pattern

int e = Tn(x) + iUn 1(x) sin t with x = cos t, (1.4.3) −

where Tn and Un 1 are some polynomials of degrees n and n 1 respectively. − − However we can use induction to verify (1.4.3). When n = 1, we simply put T1(x) = x and U0(x) = 1. Assume the validity for n = k. Then

i(k+ 1)t ikt it e = e e = (Tk(x) + iUk 1(x) sin t)(cos t + i sin t) − 2 = Tk(x) cos t Uk 1(x) sin t + i(Tk(x) sin t + Uk 1(x) sin t cos t) − − − 2 = Tk(x)x + Uk 1(x)(1 x ) + i(Tk(x) + Uk 1(x)x) sin t. − − − i(k+ 1)t Thus we have e = Tk+ 1(x) + iUk(x) sin t, where

2 Tk+ 1(x) = Tk(x)x + (1 x )Uk 1(x) Uk(x) = Tk(x) + xUk 1(x). − − −

The last two identities tells us how to generate polynomials Tn(x) and Un(x) recursively. Comparing the real parts of both sides of (1.4.3), we get cos nt = Tn(x) = Tn(cos t). The orthogonality relation (1.4.1) in the last example can be rewritten as

2 π T (cos t) T (cos t) dt = δ π m n mn 0 Now apply the change of variable x = cos t. Notice that cos0 = 1, cos π = 1 and − dx = sin t dt, which gives dt = dx/ sin t = dx/√1 cos2 t = dx/√1 x2; (notice − − − − − − that, for 0 t π, we have sin t 0) Observe that, when t suns from 0 to π, cos t drops ≤ ≤ ≥ from 1 to 1. Thus we have − 2 1 dx Tm(x)Tn(x) = δmn. 2 π 1 √1 x − − This shows that if we define the inner product of two functions f and g by

1 dx f, g = f(x)g(x) , 2 1 √1 x − − then the Chebyshev’s polynomials Tn(x)(n = 1, 2, 3,...) form an orthonormal system.

1.5. Given a list of linearly independent vectors v1, v2,..., vn in an inner product space V , there is a procedure of constructing an orthonormal system e1, e2,..., en, called the Gram-Schmidt process, with the property that

span v , v ,..., v = span e , e ,..., e { 1 2 k} { 1 2 k} 8 for each k = 1, 2, . . . , n. To make things easier, let us describe how to construct an b1, b2,..., bn with the similar property. After this we normalize b’s to get e’s — a simple finishing touch. We construct b’s in n steps: the kth step is the one to obtain b , (1 k n). The k ≤ ≤ first step is the easiest one: just take v to be b . Now suppose that the (k 1)th step 1 1 − has been accomplished: we have obtained an orthogonal system b1, b2,..., bk 1 which { − } spans the same subspace as v1, v2,..., vk 1 does, say Wk 1. Consider the vectors { − } −

b1, b2,..., bk 1, vk, vk+ 1,..., vn. −

Let wk be the projection of vk onto the subspace Wk 1, which is given by −

vk, b1 vk, b2 vk, bk 1 wk = b1 + b2 + + − bk 1 b1, b1 b2, b2 bk 1, bk 1 − − − according to (1.3.3). Now let bk = vk wk. Then bk Wk 1, and hence b1, b2,..., bk − ⊥ − form an orthogonal system. Also, from the fact that wk is in Wk 1, we can see that − the set b , b ,..., b spans the same subspace as v , v ,..., v does; (this subspace { 1 2 k} { 1 2 k} should be denoted by Wk.) As we have mentioned before, once we get the orthogonal basis b1, b2,..., bn, the required orthonormal basis e1, e2,..., en can be obtained immediately by normalization: b b b e = 1 , e = 2 , , e = n . 1 b 2 b n b 1 2 n This process of GramSchmidt is more or less a way to turn a given bunch of vectors, one by one, progressively, to“straighten them up”. Each time, you turn a vector to make it orthogonal to all the previous vectors which have been “straightened up”. In this way

v1, v2,..., vn is gradually replaced by b1, b2,..., bn, one vector at each time.

Example 1.5.1. Apply Gram–Schmidt process to the basis consisting of v1 = 3 (1, 1, 1), v2 = (2, 0, 1) and v3 = (0, 0, 3) in C to obtain an orthonormal basis.

Solution. Let b1 = v1 = (1, 1, 1),

v , b 3 b = v 2 1 b = (2, 0, 1) (1, 1, 1) = (1, 1, 0), 2 2 − b , b 1 − 3 − 1 1 v , b v , b 3 0 b = v 3 1 b 3 2 b = (0, 0, 3) (1, 1, 1) (1, 1, 0) = ( 1, 1, 2). 3 3 − b , b 1 − b , b 2 − 3 − 3 − − − 1 1 2 2 Upon normalization, we obtain the following orthonormal basis:

1 1 1 1 1 1 1 2 e1 = , , , e2 = , , 0 , e3 = , , . √3 √3 √3 √2 −√2 −√6 −√6 √6 9 We can use the GramSchmidt process to prove that every finite dimensional vector space has an orthonormal basis. Indeed, if V is a finite dimensional space, either over R or over C, we can take any basis in V and apply GramSchmidt process to this basis to obtain an orthonormal basis of V .

EXERCISE SET IV.1.

Review Questions. What is the main difference between a complex and a real inner product space? What is the orthognal projection on to a subspace? How to compute this when an orthogonal basis of this subspace is given? What is GramSchmidt’s process? What is it good for?

Drills

1. In each of the following cases, find the inner product u, v and the norms u and v | | | | of vectors u and v in C3 (with the standard inner product.) (a) u = (1, i, 2), v = ( 2, i, 1). − (b) u = (i, 2, 2), v = (2, 2i, i). − (c) u = (1 + √3i, 1, 1 √3i), v = (1 √3i, 1 + √3i, 1). − − (d) u = (i cos α, sin α, cos α + i sin α), (cos β, i sin β, sin β + i cos β).

2. In each of the following cases, find the orthogonal projection of a vector u in an inner product space to the 1dimensional subspace spanned by v. (a) V = R2; u = ( 1 , 1 ) and v = (1, 0). √ 2 − √ 2 (b) V = R3; u = (2, 1, 1) and v = (1, 2, 3). (c) V = C2; u = (1 + i, 1 i) and v = (1, i). − (d) V = C3; u = (2, 1, 3) and v = (1, 1, i). (e) V is the space of continuous functions on [0, 1] with the inner product f, g = 1 f(x)g(x)dx; u is the function f(x) 1 and v is g(x) = eiπx.(Hint: Use the 0 ≡ identities ez = ez¯ and eaxdx = 1 eax + C, where a = 0.) a (2πi)x (4πi)x (f) Same V as in (e); u is the function h(x) = e and v is k(x) = e . 3. In each of the following cases, find the projection of a vector u in an inner product

space V onto the subspace spanned by v1, v2. 2 (a) V = R ; u = (your age, your weight), v1 = (1, 0) and v2 = (1, 1). 3 (b) V = R ; u = (3, 1, 1), v1 = (1, 1, 0) and v2 = (1, 1, 2). (Notice that v1 v2.) 3 − ⊥ (c) V = R ; u = (2, 3, 1), v1 = (1, 2, 0) and v2 = (4, 7, 0). (Hint: Determine the subspace spanned by v1 and v2 first.) (d) V = C3; u = (3i, 1, 1), v = (1, i, i) and v = (2i, 1, 1). (Notice that v v ) − 1 2 1 ⊥ 2 10 4. True or false (u and v are vectors in a complex inner product space V , M and N are subspaces of V and z is a complex number; all of them are arbitrary): (a) u, zv = zu, v . (b) zu, zv = z 2 u, v . | | (c) u, v v, u = u, v 2. | | (d) If the identity u, v = v, u holds, then u, v must be a real number. (e) If u v and u v , then u + u is orthogonal to v + v . 1 ⊥ 1 2 ⊥ 2 1 2 1 2 (f) If u v and u v, then u + u is orthogonal to v. 1 ⊥ 2 ⊥ 1 2 (g) If u v, then v u. ⊥ ⊥ (h) If u v and v w, then u w. ⊥ ⊥ ⊥ (i) If the orthogonal projections of u and v on the subspace M (of an inner product space V ) are the same, then u v is orthogonal to M. − (j) The GramSchmidt process is a process to construct an in ner product on a vector space such that a given basis in this space becomes an orthonormal basis. 5. In each of the following parts, apply the GramSchmidt orthogonalization process to the given linearly independent set of vectors (in the given order) in C4. (a) (1, 1, 0, 0), (1, 1, 1, 0), (1, 1, 1, 1). (b) (1, 1, 1, 1), (1, 1, 1, 0), (1, 1, 0, 0). (c) (1, 1, 1, 1), (1, 0, 1, 0), (0, 0, 1, 1), (0, 0, 0, 1). (d) (1, i, i, i), (1, i, i, 0), (1, i, 0, 0), (1, 0, 0, 0). (e) (0, 0, 2, 0), (1, 0, 4, 0), (5, 2, 0, 1).

Exercises

1. Let V be a complex inner product space and denote by VR the real vector space obtained from V by restricting scalars to R.

(a) Show that the recipe u, v R = Re u, v defines an inner product for the real space VR . (Rez and Imz stand for the real part and the imaginary part, respec tively, of a complex number z.)

(b) Show that the recipe u, v = Im u, v does not give an inner product for VR . I (c) Check the identity u, v = u, v R + i u, iv R . 2. Let = e , e ,..., e be an orthonormal basis of a complex inner product space V . E { 1 2 n} Show that, if [v] = (v1, v2, . . . , vn) and [w] = (w1, w2, . . . , wn) for v, w V , then E E ∈ v, w = v w + v w + + v w . 1 1 2 2 n n

3. Let u and v be two vectors in a complex inner product space.

11 (a) Show that u + v 2 = u 2 + v 2 + 2Re u, v . | | | | | | (b) From the above identity derive that 4Re u, v = u + v 2 u v 2. | | − | − | (c) Show that the imaginary part of u, v is Re u, iv . Use this fact and (c) to derive the following identity for complex inner product spaces:

1 u, v = ( u + v 2 u v 2 + i u + iv 2 i u iv 2). 4 | | − | − | | | − | − |

(A neat way to rewrite this is u, v = 1 3 ik u + ikv 2.) 4 k= 0 | | 4. Let v1 and v2 be linearly independent vectors in a real inner product space V . Show that the area A of the parallelogram stretched by v1 and v2 is equal to the of v , v v , v 1 1 1 2 . v , v v , v 2 1 2 2 (You also have to explain why the above cannot be negative so that we can take its square root.) Hint: Write v2 = w + h, where w is the projection of v2 onto v . Then A2 = v 2 h 2. 1 | 1| | |

12 2. Operators on Inner Product Spaces §

2.1. In this section we consider operators on a finite dimensional inner product space over either R or C. Because of an extra structure on the vector space, namely, the inner product, these operators have a new feature called adjoint, which behaves like the complex conjugation for complex numbers. For notational convenience, we limit our discussion to operators on inner product spaces, although all material from 2.1 to 2.3 are applicable § § to linear mappings between inner product spaces.

Let T be a linear operator on a (finite dimensional) inner product space V , either real or complex. The adjoint of T , denoted by T ∗, is the linear operator on the same space V such that the identity

T x, y = x,T ∗y (2.1.1)

holds for all vectors x and y in V . At the outset it is not clear whether such T ∗ exists, and if it does exist, it is not clear if it is uniquely determined by T . We have to establish the

existence and uniqueness of the operator T ∗ which satisfies (2.1.1) in order to justify this definition. Aside: The definition of adjoint here is unusual. Instead of telling us exactly

what T ∗ is, it singles out the most desirable property of T ∗. We may call it a “priority definition”, because this desirable property here has priority over anything else. “Priority definitions” are not rare in mathematics. To justify the above “priority definition” for the adjoint of an operator, we must prove the following statement:

(*) For every linear operator T on V , there is a unique linear operator S on V such that T x, y = x,Sy for all x, y V . ∈ Once this statement is proven, we can define T ∗ to be the unique operator S described in this statement. The “uniqueness” part is easier to prove. Assume that both S1 and S2 have the same property as S as described, namely T x, y = x,S y and T x, y = x,S y 1 2 for all x and y in V . Let R = S S . Then x,Ry = x,S y x,S y = 0 for arbitrary 1 − 2 1 − 2 x, y in V . Thus, for every y in V , Ry is orthogonal to all vectors in V and hence Ry = 0.

Therefore R = O, or S1 = S2. The proof of the existence part of ( ) is based on the following lemma, which is a ∗ “baby version” of famous theorem called the Riesz representation theorem.

Lemma 2.1.1. If φ is a linear of V (that is, φ is in V ′) then there exists a unique vector a in V such that φ(x) = x, a . Take an orthonormal basis = e , e ,..., e in V . (The last remark in the last section E { 1 2 n} gaurantees its existence.) Then, for each vector x in V , we have x = n x, e e (see k= 1 k k 13 identity (1.2.2) in the last section) and hence

n n n φ(x) = φ x, ek ek = x, ek φ(ek) = x, ek φ(ek) k= 1 k= 1 k= 1 n n = x, φ(ek)ek = x, φ(ek)ek . k= 1 k= 1 Hence φ(x) = x, a where a = n φ(e )e . The uniqueness of a is left for you to check k= 1 k k as an exercise. Now we return to the proof of the existence part of ( ). Take any y in V and consider ∗ the linear functional φy defined by putting φy (x) = T x, y . By the above lemma we know that there exists a unique vector determined by y, say Sy such that φy (x) = x,Sy . Thus we have T x, y = x,Sy for all x in V . The linearity of S is left for you to check as an exercise. Thus S is the required operator T ∗.

We consider the representation [T ] of an operator T with an orthonormal E basis = e1, e2,..., en in V . The first column of [T ] is filled with the coordinates of E { } E T e . Since is an orthonormal basis, we have (see (1.2.2) in the last section) 1 E T e = T e , e e + T e , e e + + T e , e e . 1 1 1 1 1 2 2 1 n n

Hence the first column of [T ] is filled with T e1, e1 , T e1, e2 , etc. The same method E allows us to figure out other columns. We arrive at:

T e1, e1 T e2, e1 T en, e1 T e , e T e , e T e , e 1 2 2 2 n 2 [T ] =  . . .  . E . . .        T e1, en T e2, en T en, en   

The (j, k)entry of [T ] , denoted by tjk, is given by E

t = T e , e . jk k j Reversing the order of j, k looks awkward, but things turn out that way and we cannot help it. Now the (k, j)–entry of [T ∗] , denoted by tkj∗ , is given as follows: E

t∗ T ∗e , e = e ,T e = T e , e = t . kj ≡ j k j ∗ k j k jk

Thus the matrix of T ∗ relative to is the conjugate of the matrix of T relative E to . We also call the of a matrix A the adjoint of A and denote it E 14 by A∗. Thus A∗ = A⊤. We have shown that “the matrix of the adjoint of T is the adjoint of the matrix of T ” relative to any orthonormal basis : E

[T ∗] = [T ]∗ (2.1.2) E E We give some quick examples of adjoints of matrices as follows:

2i 1 i ∗ 2i 2i z w ∗ z w = , = . 2i 1− + i 1− + i 1− i w z w z −

Example 2.1.1. Let = e , e ,..., e be an orthonormal basis of a inner product E { 1 2 n} space. By the forward shift relative to this basis we mean the operator S satisfying

Se1 = e2,Se2 = e3,...,Sen = 0.

What is its adjoint S∗ ? Well, the representing matrix of S relative to is E 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0  0 1 0 0 0 0   0 0 0 0 0 0  [S] =  0 0 1 0 0 0  with [S]∗ =  0 0 0 0 0 0  . E   E            0 0 0 1 0 0   0 0 0 0 0 1       0 0 0 0 1 0   0 0 0 0 0 0      From [S∗] = [S]∗ we know E E

S∗e1 = 0,S∗e2 = e1,S∗e3 = e2,...... ,S∗en = en 1. −

Naturally, S∗ is called the backward shift relative to . E Example 2.1.2. An operator D on V is diagonal relative to the orthonormal basis = e ,..., e if there are scalars λ , λ , . . . , λ such that De = λ e for all k = E { 1 n} 1 2 n k k k 1, 2, . . . , n. In this case the representing matrix of D relative to is a E

λ1 0 ... 0 λ1 0 ... 0 0 λ2 ... 0 0 λ2 ... 0  .   .  [D] = . with [D]∗ = . . E . .   E            0 0 . . . λn   0 0 ... λn      The representing matrix of the adjoint D∗ is also a diagonal matrix, obtained by replacing

each entry on the main diagonal by its . Therefore D∗ is also a diagonal

15 operator relative to the basis with D∗e = λ e for k = 1, 2, . . . , n. Notice that, in case E k k k the scalars λ1, λ2, . . . , λn are real numbers, D and D∗ have the same representation matrix relative to and hence D = D∗. In this case D is called a Hermitian operator or a E self–adjoint operator.

2.2. The following elementary properties about adjoints should be kept in mind:

(S + T )∗ = S∗ + T ∗, (αS)∗ = αS∗, (ST )∗ = T ∗S∗, (2.2.1) T ∗∗ = T,O∗ = O,I∗ = I, where S and T are operators on a (finite dimensional) inner product space, and α is an arbitrary scalar. One way to prove these identities is by definition. For example, to show

(ST )∗ = T ∗S∗, we only have to check the identity ST x, y = x,T ∗S∗y . This is an easy thing to do, provided you understand the definition of adjoint:

x,T ∗S∗y = x,T ∗(S∗y) = T x,S∗y = ST x, y . In the special case V = Cn or Rn with the standard inner product, every linear operator on V is induced by a matrix, i.e. all linear operators are of the form M for some n n A × matrix A. The verification of the following proposition is left to you as an exercise.

Proposition 2.2.1. If T is induced by A, then T ∗ is induced by A∗, i.e. MA∗ = MA∗ . (The adjoint of the induced operator is the induced operator of the adjoint.) We have seen some advantages of studying matrices by investigating the operators induced by them. We will discover that the above simple fact is very handy in treating matrix problems by this approach.

2.3. Let M be a subspace of an innr product space V over C or R. The of M, denoted by M ⊥, is the set of vectors in V perpendicular to all vectors in M. Thus v is in M ⊥ if v M, that is, v x for all vectors x in M. Using set– ⊥ ⊥ theoretical notation, we can write

M ⊥ = v V : v, x = 0 for all x M . { ∈ ∈ }

If x is in both M and M ⊥, then we have x, x = 0 and hence x = 0. Thus M M ⊥ = 0 . ∩ { } On the other hand, for any vector v in V , we have the orthogonal decomposition v = w+h with w M and h M ⊥; see 1.3 of the last section. This shows V = M + M ⊥. It ∈ ∈ § follows from Theorem 2.3.2 in Chapter II that

dim V = dim M + dim M ⊥. (2.3.1)

16 It is clear from the definition of orthogonal complement that M is contained in M ⊥⊥. On the other hand, the above identity tells us that M and M ⊥⊥ have the same . Hence M ⊥⊥ = M. For T (V ), where V is an inner product space over C or R, we have ∈ L

Theorem 2.3.1. The of T ∗ is the orthogonal complement of the range of T :

ker T ∗ = T (V )⊥.

Remark: Since T ∗∗ = T and W ⊥⊥ = W for a subspace W of V , we can deduce from this

theorem that ker T = T ∗(V )⊥, T (V ) = (ker T ∗)⊥, and T ∗(V ) = (ker T )⊥.

The proof of this important theorem is short and neat:

v T (V )⊥ v,T x = 0 for all x V ∈ ⇔ ∈ T ∗v, x = 0 for all x V ⇔ ∈ T ∗v = 0 ⇔ v ker T ∗. ⇔ ∈ The proof is complete.

We give two interesting applications of the above theorem. In the first application we use it to prove “row = column rank”. First we do this for a real, , say A of size n n. Consider the operator T on Rn induced by A, i.e. T = M . The × A column rank of A is the dimension of the subspace spanned by column vectors of A and this subspace is just T (V ). By the above theorem,

dim ker T ∗ = dim T (V )⊥ = n dim T (V ). (2.3.2) − The last identity follows (2.3.1). On the other hand,

dim ker T ∗ + dim T ∗(V ) = n. (2.3.3)

From (2.3.2) and (2.3.3) we obtain dim T (V ) = dim T ∗(V ). However T ∗ is the operator induced by the matrix A∗, which is the transpose A⊤ of A (because A is real). So the last identity tells us that the column ranks of A and A⊤ are the same. But the column rank of A⊤ is just the row rank of A! So the staement is proven for a real square matrix. What can we do if A is not real? In this case we work with Cn instead of Rn. The same argument allows us to conclude A and A∗ have the same column rank. But here we have a

small trouble: A∗ is the conjugate transpose of A, instead of the transpose A⊤. However,

17 observe that the column rank of a matrix does not change if we replace all entries by their complex conjugates. So, this “small trouble” is in fact not a trouble. What can we do if the given matrix is not a square matrix? In this case we “augment” this matrix with more rows or columns of zeros to convert it into a square matrix. The row rank and the column rank of the enlarged matrix clearly remain the same. Thus we have proved “row rank = column rank” in its full generality.

The next application is about the least square approximation. Let T be a linear operator on a finire dimensional inner product space V and let b be a vector not in its range T (V ). Consider the following “ill–posed problem”: solve T x = b. Since b is not in T (V ), this equation has no solution. The best we can do is to find some x so that the difference between T x and b is minimized. So now we are asking to find a vector x0 at which T x b is minimized. The minimization requirement tells us that y = T x − 0 0 is, among all vectors in the subspace T (V ), the one nearest to b. So y b must be 0 − perpendicular to the subspace T (V ). Theorem 2.3.1 tells us that y b is in the kernel of 0 − T ∗, that is T ∗(y b) = 0, or T ∗(T x b) = 0, that is 0 − 0 −

T ∗T x0 = T ∗b. (2.3.4)

The argument here can be reversed: if (2.3.4) holds, then y b T (V ). We have proved 0 − ⊥ that the least square solutions to T x = b are the same as the solutions to T ∗T x0 = T ∗b.

Example 2.3.2. Find the least square solution(s) to x 2x = 1, x + 2x = 3. 1 − 2 − 1 2 Solution. Write the system of equations as Ax = b with

1 2 x 1 A = , x = 1 , b = . 1− 2 x 3 − 2 It is easy to see that this is an ill–posed problem and hence we should look for the least

square solution(s) by solving A∗Ax = A∗b. Now

1 1 1 2 2 4 1 1 1 2 A A = = ,A b = = . ∗ 2− 2 1− 2 4− 8 ∗ 2− 2 3 −4 − − − −

Thus A∗Ax = A∗b becomes 2x 4x = 2, 4x + 8x = 4, giving us x 2x = 1. 1 − 2 − − 1 2 1 − 2 − Introducing the parameter t = x , we can write down the solutions as x = 2t 1, x = t. 2 1 − 2

Example 2.3.3. Find the least square solution(s) to ix + x = 2, x + ix = 2, 1 2 1 2 − x1 + x2 = 4.

18 Solution. Write the system of equations as Ax = b with

i 1 2 x A = 1 i , x = 1 , b = 2 .   x2  −  1 1 4     We look for the least square solution(s) by solving A∗Ax = A∗b. Now

i 1 2 i 1 1 3 1 i 1 1 2 2i A∗A = − 1 i = ,A∗b = − 2 = − . 1 i 1   1 3 1 i 1  −  6 + 2i − 1 1 − 4     Thus A∗Ax = A∗b becomes 3x + x = 2 2i, x + 3x = 6 + 2i, giving us x = i and 1 2 − 1 2 1 − x2 = 2 + i.

2.4. An operator H on a complex inner space V is called a self-adjoint operator, or a Hermitian operator, if H = H∗. In the same way, we call a square matrix A a selfadjoint matrix or a if A = A∗, i.e. if A is equal to its conjugate transpose. Thus, a 2 2 Hermitian matrix must have the form × c a + bi 4 2 + 3i such as , a bi d 2 3i 7 − − where a, b, c, d are real numbers. Notice that a complex number z is real if and only if z = z, which is the one

dimensional version of the identity T = T ∗. Hence the situation of Hermitian operators among other operators resembles that of real numbers among complex numbers.

Example 2.4.1. Verify that eigenvalues of Hermitian operators are real. Solution. Let T be a Hermitian operator on V and let λ be an eigenvalue for T . Then there is a nonzero vector v in V such that T v = λv. So T v, v = λv, v = λ v, v . On the other hand,

T v, v = v,T ∗v = v,T v = v, λv = λ v, v . Hence λ v, v = λ v, v . As v = 0, we have v, v = v 2 = 0 and hence v, v of the last identity can be canceled. Thus λ = λ. Therefore λ is real.

We consider a similar concept for the real case. An operator T on a real inner product space V satisfying T = T ∗ is called a symmetric operator. A real square matrix A is called a if A = A⊤. The representation matrix of a symmetric operator relative

19 to an orthonormal basis is symmetric. A real symmetric matrix is clearly a Hermitian and hence its eigenvalues are real. We can translate this statement into an assertion about symmetric operators on real inner product space:

Proposition 2.4.1. If T is a symmetric operator on a real, finite dimensional, inner product space, then T has a real eigenvalue λ and consequently ker(T λI) = 0. −

2.5. An operator T defined on a (finite dimensional) real inner product space V is an orthogonal operator if it preserves the inner product of V , i.e.

T x,T y = x, y (2.5.1)

We can rewrite the above identity as x,T ∗T y = x, y . This gives x, (T ∗T I)y = 0 − for all x and y in V . We deduce that T ∗T I = O, or T ∗T = I, i.e. T is invertible − and its inverse is T ∗. By reversing the above argument, we can show that, conversely, if 1 T − = T ∗, then T is orthogonal. We conclude: A linear operator T on a real inner product

space is orthogonal if and only if T ∗T = TT ∗ = I.

An is a real square matrix A satisfying AA⊤ = A⊤A = I. The representing matrix of an orthogonal operator relative to an orthonormal basis is an orthogonal matrix. (Verify this statement!) By letting y = x in identity (2.5.1), we have T x,T x = x, x , or T x 2 = x 2. Hence we have T x = x for all x V . In other words, an orthogonal operator preserves ∈ the norm. It turns out that the converse of this statement is also true:

Proposition 2.5.1. A norm-preserving linear operator is an orthogonal operator.

To prove this fact, we have to express the inner product of two vectors in terms of the norms of certain linear combinations of them, called the :

4 x, y = x + y 2 x y 2. (2.5.2) − − To prove (2.5.2), we begin with the elementary identity v 2 = v, v which holds for all vectors v in an inner product space. Applying this identity for v = x + y, we have

x+y 2 = x+y, x+y = x, x + x, y + y, x + y, y (2.5.3) = x 2 +2 x, y + y 2. Letting v = x y instead, we will get a similar result: − x y 2 = x 2 2 x, y + y 2. (2.5.4). − − 20 Now you can see that the polarization identity (2.5.2) is obtained by subtracting (2.5.4) from (2.5.3) and a simple rearrangement of sides. From the polarization identity (2.5.2) we can deduce Poposition 2.5.1 stated above. Indeed, if T is a linear operator on V satisfying T (x) = x for all x V , then ∈ 4 T x,T y = T x + T y 2 T x T y 2 − − = T (x + y) 2 T (x y) 2 − − = x + y 2 x + y 2 = 4 x, y . − Canceling 4, we get the required identity which characterizes orthogonal operators.

Let σ be a permutation of the set 1, 2, . . . , n . Then T on Rn defined by { } σ

Tσ(x1, x2, . . . , xn) = (xσ(1), xσ(2), . . . , xσ(n))

is an orthogonal operator, because the sum of squares of x1, x2, . . . , xn remains the same if their order is changed. For example, if σ sends 1, 2, 3, 4, 5 to 4, 1, 5, 2, 3 respectively, then Tσ(x1, x2, x3, x4, x5) = (x4, x1, x5, x2, x3). Another example of orthogonal operators 2 is the operator MA on R (with the standard inner product) induced by

cos θ sin θ A = , sin θ −cos θ where θ is a fixed real number. You can check directly that A is an orthogonal matrix.

From this you may conclude that MA is an orthogonal operator.

Example 2.5.2. By an orthogonal projection we mean a self–adjoint projection. 2 Thus, if P is an orthogonal projection, then P = P and P ∗ = P . Verify the following assertion: if P is an orthogonal projection, then I 2P is an orthogonal operator. − 2 Solution. Since P is an orthogonal projection, we have P = P and P ∗ = P . So

2 2 (I 2P )∗(I 2P ) = (I 2P )(I 2P) = I 2P 2P +4P = I 2P 2P +4P = I. − − − − − − − −

Similarly we have (I 2P )(I 2P )∗ = I. Hence I 2P is an orthogonal operator. − − −

2.6. Let A be a n n real matrix. Denote by = e1, e2,..., en the standard basis n × n E { } of R . Let T = MA be the operator on R induced by A, i.e. T = MA. Then, as we know very well by now, v T e is the jth column of A, for each j. If A is an orthogonal matrix, j ≡ j then T is an othogonal operator and hence v1, v2,..., vn, which are the images of vectors in the standard orthonormal basis under the operator T , also form an orthonormal basis.

21 (Notice that, since T preserves inner products, it sends an orthonormal basis to another.)

Conversely, suppose that the columns v1, v2,..., vn of A form an orthonormal basis of Rn, that is v , v = δ . (Recalled that the δ stands for 1 if j = k and j k jk jk 0 if j = k.) Then, for vectors x = (x , . . . , x ) = x e and y = (y , . . . , y ) = y e 1 n k k k 1 n j j j in Rn, we have T x = x T e = x v and similarly T y = y v , and hence k k k k k k j j j T x,T y = x y v , v = x y δ = x y = x, y . k j k j k j kj k k k,j k,j k This says that T is an orthogonal operator. Hence A is an orthogonal matrix. We conclude:

Proposition 2.6.1. A real n n matrix is an orthogonal matrix if and only if its × columns form an orthonormal basis of Rn.

For example, we observe that ( 1/3, 2/3, 2/3), (2/3, 1/3, 2/3), (2/3, 2/3, 1/3) form an − − − orthonormal basis of R3; (this can be cheked directly). Therefore the 3 3 matrix × 1/3 2/3 2/3 A = −2/3 1/3 2/3  2/3− 2/3 1/3  −   is orthogonal. The induced operator MA given by

1 M (x , x , x ) = ( x + 2x + 2x , 2x x + 2x , 2x + 2x x ) A 1 2 3 3 − 1 2 3 1 − 2 3 1 2 − 3

is an orthogonal operator on R3.

Example 2.6.2. Notice that the matrix

1 1 1 1/√2 1/√2 H1 = wih columns v1 = , v2 = √2 1 1 1/√2 1/√2 − − is an orthogonal matrix, since we can check that its columns v1, v2 form an orthonormal 2 basis in R . Now we describe a process to define the Hn. Let

a a A = 11 12 a a 21 22 be a 2 2 matrix and let B be an n n matrix. We define their product A B × × ⊗ to be the 2n 2n matrix given × a B a B A B = 11 12 . ⊗ a21B a22B 22 We have the following basic idntities about tensor products of matrices:

aA bB = ab(A B), (A B)∗ = A∗ B∗, (A B)(C D) = AC BD. (6.1) ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ A consequence of these identities is: if A and B are orthogonal (or unitary), then so is A B. For example ⊗ 1 1 1 1 1 1 1 1 1 1 H1 H1 1 1 1 1 1 H2 H1 H1 = = − − ≡ ⊗ 2 1 1 ⊗ 1 1 ≡ √2 H1 H1 2  1 1 1 1  − − − − −  1 1 1 1   − −  We can define Hn inductively by putting

1 Hn 1 Hn 1 Hn = H1 Hn 1 = − − ⊗ − √2 Hn 1 Hn 1 − − − which is a 2n 2n orthogonal matrix, called the Hadamard matrix. We remark that × tensoring is an important used in many areas, such as quantum information and quantum computation.

Notice that the transpose A⊤ of an orthogonal matrix A is also an orthogonal matrix. In fact, from AA⊤ = A⊤A = I we immediately get (A⊤)⊤A⊤ = A⊤(A⊤)⊤ = I. Since transposing a matrix changes its columns into rows, from Proposition 2.6.1 we deduce: a real n n matrix is an orthogonal matrix if and only if its rows form an orthonormal basis × in Rn.

2.7. Unitary operators are the complex version of orthogonal operators. A linear operator U on a complex inner product space V is a if

Ux,Uy = x, y for all x and y in V , i.e. U preserves the inner product of V . As in the real case, a linear operator on a complex inner product space is a unitary operator if and only if

U ∗U = UU ∗ = I, that is, U ∗ is the inverse of U. As before, unitary operators are norm preserving. The converse is also true but the proof is more difficult than the orthogonal case. Similarly, a complex square matrix A is called a if AA∗ = A∗A = I. By recycling previous arguments we can show that a n n complex matrix is unitary if × and only if its columns (or its rows) form an orthonormal basis. A quick example:

cos θ i sin θ 1 1 i 1 √3 i such as and i sin θ cos θ √2 i 1 2 i √3 23 is an unitary matrix for each real θ.

Example 2.7.1. Let ω = e2πi/n. The columns of the following matrix F is the orthonormal basis of Cn (see Example 1.4.1 in 1.4) § 1 1 1 1 1 1 2 3 4 n 1 1 ω ω ω ω ω − 2 4 6 8 2(n 1)  1 ω ω ω ω ω −  1 Fn =   √n          n 1 2(n 1) 3(n 1) 4(n 1) (n 1)(n 1)   1 ω − ω − ω − ω − ω − −    and hence F is a unitary matrix. The linear mapping associated with this matrix is called the finite Fourer transform. To speed up this transform by using some special methods is crucial for reducing the cost of communication network in recent years. The rediscovery of so–called FFT (Fast Fourier Transform) has great practical value in cutting cost substantially. Now the historian in mathematics can back FFT method as early as Gauss, who certainly did not have this sort of application im mind!

2.8. In the above subsection we have seen that unitary matrices come from unitary operators. Here we describe another source of such matrices: change of orthonormal basis.

In 2 of Chapter III we describe the connection between matrices [T ] and [T ] § E F representing an operator T on a vector space V relative to two different bases and in E F V :[T ] and [T ] are similar, i.e. there is an P such that E F

1 [T ] = P [T ] P − . (2.8.1) F E

Now we make the further assumptions that V is an inner product space and both and E are orthonormal bases. If we go over the argument in 2 in Chapter III again, we can F § check that the matrix P in (2.8.1) is a unitary matrix in the complex case and P is a orthogonal matrix in the real case. This leads to the following two definitions: two n n × complex matrices A and B are unitarily equivalent if there is a unitary matrix U such that UAU ∗ = B; two n n real matrices A and B are orthogonally equivalent if there × is a orthogonal matrix P such that P AP ⊥ = B. Using the terminology here, we have

1. Matrices representing the same operator on a finite dimensional complex inner product spaces relative to different orthonormal bases are unitarily equivalent; 2. Matrices representing the same operator on a finite dimensional real inner product spaces relative to different orthonormal bases are orthogonally equivalent.

24 EXERCISE SET IV.2.

Review Questions. Can I state the definitions and give examples of the following terms? adjoint of a linear operator (of a matrix), Hermitian operator (Hermitian matrix), unitary operator (unitary matrix), orthogonal operator (orthogonal matrix), sym operator (symmetric matrix). What numbers do they correspond in the onedimensional case?

Drills

1. In each of the following cases, find the adjoint A∗ of the given matrix A:

1 + 2i 2 + 3i 3 + 4i 0 i 0 i 1 i (a) A = 4 + 5i 5 + 6i 6 + 7i (b) A = 0 0 1 (c) A = i 1 i .  7 + 8i 8 + 9i 9 + 9i   0 0 0   i 1 i        2. Let A and B be n n matrices and let a be a complex number. Verify the following × identities. ( These identities will be used freely without giving explicit references. ♠ So you must get familiar with them. .) ♠

(A + B)∗ = A∗ + B∗, (aA)∗ = aA∗, (AB)∗ = B∗A∗, (A∗A)∗ = A∗A.

1 1 Also, if A is invertible, then so is A∗ and (A− )∗ = (A∗)− . 3. Find the missing entry (or entries) indicated by in each of the following unitary ∗ matrices:

1 1 1 1 1 3 4i 1 1 + i 1 i ∗ , ∗ , − , − , √2 1 1 √2 i 1 5 4i 2 1 + i ∗ ∗ 1 1 1 2 i 2 2 ∗ ∗ 1 ∗ 1 1 1 1 1 2 1 , 2 , − − ∗ . 3  ∗  3  ∗ ∗  2  1 1 1  2 1 2 2i ∗ ∗ ∗  1 1 1       − ∗  4. Find the least square solution(s) to each of the following inconsistent systems

(a) x1 + ix2 = 0, x1 + ix2 = 2. (b) x + x = 1, x x = 1, x + 2x = 5. 1 2 1 − 2 1 2 (c) x + x + x = 1, x x + x = 1, x x x = 1, x + x x = 1. 1 2 3 1 − 2 3 1 − 2 − 3 1 2 − 3 25 5. True or False: (a) The sum of two Hermitian matrices is Hermitian. (b) The product of two Hermitian matrices is Hermitian. (c) If a Hermitian matrix is invertible, then its inverse is also Hermitian. (d) The sum of two unitary matrices is unitary. (e) The product of two unitary matrices is unitary. (f) Unitary matrices are invertible and their inverses are also unitary. (g) An orthogonal matrix is a matrix orthogonal to a set of given matrices. 1 (h) If H is a Hermitian matrix and if U is a unitary matrix, then UHU − is a Hermitian matrix. 1 (i) If H is a Hermitian matrix and if P is an invertible matrix, then PHP − is a Hermitian matrix.

(j) If A is an arbitrary matrix, then A∗A is a Hermitian matrix.

6. Write down each of the following matrices explicitly: (a) the 8 8 Hadamard matrix H explicitly (for notation, see Example 2.6.2) × 3 (b) unitary matrices F2, F3, F4, F6, F8 in finite Fourier transform (for notation, see Example 2.7.1).

Exercises

1. Let R be a linear operator on a complex inner product space V such that R2 = I. Show that R is unitary if and only if R is Hermitian. 2. Let T be a linear operator on a finite dimensional complex inner product space V . Show that there exist unique Hermitian operators H and K on V such that T = H +iK.(Aside: This is the analogue of the identity z = x+iy (where x and y are the

real part and the imaginary part of z) for complex numbers. Notice that T ∗ = H iK, − which is analogous to z = x iy.) − 3. Recall that a projection is a linear operator E satisfying E2 = E. If furthermore, the space V on which E is a complex inner product space and if E is a Hermitian 2 operator (thus E = E = E∗), then E is called an orthogonal projection. Show that a projection E on an inner product space is an orthogonal projection if and only if its kernel is orthogonal to its range: ker E E(V ). ⊥ 4. Let V be a 2dimensional inner product space and let T be a linear operator on V . Show that T 2 = O if and only if there is an orthogonal system e, f in V such that { } 26 T x = x, f e for all x V . Hint: The rank of T is 0 or 1. (Aside: It is straightforward ∈ to check that if T is an operator having the form T x = x, f e with e f, then T 2 = O. ⊥ Indeed, for each x V , T 2x = T (T x) = T ( x, f e) = x, f T e = x, f e, f e = 0, ∈ due to the assumption that e, f = 0.) 5. Let T be a linear operator on a finite dimensional inner product space V .

(a) Show that, for all x V , T ∗T x, x 0. ∈ ≥ (b) Show that T is invertible if and only if T ∗T is invertible. 6. Show that (a) if P is an orthogonal matrix, then det(P ) is either 1 or 1, and (b) if − U is an unitary matrix, then det(U) = 1. | | 7. Let A be a 2 2 orthogonal matrix. Show that × (a) in case det(A) = 1, A is a matrix, i.e.

cos θ sin θ A = for some real number θ, sin θ − cos θ (b) in case det(A) = 1, − cos θ sin θ A = , and sin θ cos θ − (c) in case det(A) = 1, A2 = I;(Aside: A represents a reflection.) − 8. Show that a 2 2 unitary matrix U with det(U) = 1 can always be expressed as × z z 1 2 , z z − 2 1 where z and z are complex numbers satisfying z 2 + z 2 = 1. 1 2 | 1| | 2| 9. Show that, if H is a Hermitian operator on a finite dimensional complex inner product 1 space V , then H iI is invertible and U (H + iI)(H iI)− is a unitary operator − ≡ − on V .(Aside: U is called the of H.) 10. Let A, B, C be 2 2 real matrices. Check that × (a) (A B) C = A (B C) ⊗ ⊗ ⊗ ⊗ 1 (b) there is a 4 4 permutation matrix P such that P (B A)P − = A B. × ⊗ ⊗ 11*. Let T be a linear operator on a complex inner product space. Prove that (a) T is Hermitian if and only if T x, x is real for each x V . ∈ (b) T = O if and only if T x, x = 0 for each x V . ∈

27 3. Orthogonal Diagonalization §

3.1. Question: Which operators on a finite dimensional inner product space possess orthonormal bases consisting of eigenvectors? In other words, which operators can be represented by diagonal matrices relative to appropriate orthonormal bases? For short, which operators are orthogonally diagonalizable? This question does not specify whether the space is real or complex. We have to consider both situations. Moreover, we have to consider them separately, because they come up with different answers. In both situations we take the same approach: find a necessary condition first (an easy step) and then prove the sufficiency of this condition (the hard part). Before we proceed, let us make an advertisement for the forthcoming answers. There are three great things about it: first, it is thorough; second, it is neat and pleasant; and third, it is extremely important, for both theoretical and practical purposes! Without exaggeration, we can say that these answers are the best things we can learn in a subject called linear . Let us start with our investigation. First we consider the real case. Let T be a linear operator on a finite dimensional real inner product space V . Suppose that T does have an orthonormal basis consisting of e , e ,..., e which are eigenvectors of T , say E 1 2 n T ej = λjej for j = 1, 2, . . . , n. Here, of course, λ1, λ2, . . . , λn are real numbers. The matrix of T relative to is diagonal: E

λ1 λ2 [T ] [T ] =  .  . ≡ E ..    λn    The unspecified entries of the above matrix are filled with zeros. By what we have seen in

2.1 of the last section (to be more specific, identity (2.1.2)), the matrix [T ∗] of the adjoint § T ∗, also relative to , is just its transpose [T ]⊤. But [T ], as shown above, is diagonal and E hence [T ]⊤ = [T ]. So [T ∗] = [T ], from which it follows T ∗ = T , that is, T is a symmetric operator. The above conclusion is easy to get and short to say. Now a wonderful thing happens: the converse is also true!

Theorem 3.1.1. If T is a symmetric operator on a finite dimensional real inner product space V , then there is an orthonormal basis consisting of eigenvectors of T .

To prove this theorem, we have to find an orthonormal basis consisting of e , e ,..., e E 1 2 n which are eigenvectors of T . Here, n of course is dim V . Our proof proceeds by induction

28 on n. When n = 1, i.e. V is one dimensional, take any e1 and form the orthonormal basis consisting of the single vector e . This clearly will do for our purpose. E 1 Now we make the inductive hypothesis that the theorem is true for all symmetric operators on spaces of dimension m. Assume that the dimension of the space V on which T (the operator we are investigating) is defined is m+1: dim V = n = m+1. By Proposition 2.4.1 we know the the existence of a real eigenvalue for T , say λ so that ker(T λ I) = 0 . 1 − 1 { } Let us take any vector e in ker(T λ I) with e = 1. Let L be the one dimensional 1 − 1 1 subspace spanned by e1:

L = x V x = αe for some α R . { ∈ | 1 ∈ }

Let M = L⊥ = v V v, e = 0 . (Here e stands for the set consisting of single { ∈ | 1 } { 1} vector e .) Then L + M = V , dim M = dim V dim L = (m + 1) 1 = m. For each 1 − − y M L⊥, ∈ ≡ T y, e = y,T e = y, λ e = 0. 1 1 1 1 The first identity follows from the assumption T = T ∗, the second from e ker(T λ I) 1 ∈ − 1 and the last from y L and λe L. ⊥ 1 ∈ Denote by S the linear operator on M obtained by restricting T to M, that is, S is the operator defined on the subspace M by putting Sx = T x for x M. Notice that, the ∈ above argument shows that the range of S is in M and hence it is indeed a linear operator on M, not just a linear transformation from M to V . Also notice that the linearity of S is inherited from T .(Aside: In general, if M is an subspace of T , that is, M is a subspace of V with the property that x M implies T x M, then it is legitimate to ∈ ∈ consider the restriction of T to M.) As we have noticed, T being a symmetric operator can be described by the following condition:

T x, y = x,T y for all x, y V. ∈ If x and y are actually in M, then we can rewrite T x and T y as Sx and Sy respectively, and the above identity becomes Sx, y = x,Sy . This shows that S is a symmetric operator on M, which is m–dimensional. So we can apply the induction hypothesis to assert that M has an orthonormal basis consisting of eigenvectors of S, say e2, e3,..., em+ 1. Now you

can see that m + 1 vectors e1, e2,..., em+ 1 form an orthonormal basis of V consisting of eigenvectors of T . The proof os complete.

The “matrix version” of Theorem 3.1.1 is the following:

Theorem 3.1.2. If A is a real symmetric matrix, i.e. A = A⊤, then there is a real diagonal matrix D and an orthogonal matrix P such that A = PDP ⊤.(In short, a real symmetric matrix is orthogonally diagonalizable.)

29 The converse of Theorem 3.1.2 is true and very easy to prove (and hence not very exciting to us): if A = PDP ⊤ for some diagonal D and some orthogonal P , then we have A⊤ = (PDP ⊤)⊤ = P ⊤⊤DP ⊤ = PDP ⊤ = A; (recall that P ⊤⊤ = P ).

Now we start the proof of the above theorem. Assume that A is a n n real symmetric × matrix. Consider the “Godgiven” operator T M on Rn induced by A, i.e. T (x) = Ax ≡ A for x Rn. Then T is a symmetric operator on Rn (which is the real inner product space ∈ equipped with the standard inner product). By Theorem 3.1.1, there is an orthonormal basis v , v ,..., v in Rn such that T v Av = λ v for some scalars λ ,(k = 1 2 n k ≡ k k k k 1, 2, . . . , n). Let P = [v v v ], 1 2 n

that is, the matrix with v1, v2, etc. as its column vectors. Then P is an orthogonal matrix. We can check that AP = PD as follows, where D is the diagonal matrix with

λ1, λ2, . . . , λn as its diagonal entries.

AP = A[v v v ] = [Av Av Av ] = [λ v λ v λ v ], 1 2 n 1 2 n 1 1 2 2 n n

With the last written in the correct way, we have:

λ1 . [v1λ1 v2λ2 vnλn] = [v1 v2 vn] .. = PD.   λn  

Hence AP = PD, giving us A = AP P ⊤ = PDP ⊤.

3.2. Now we consider the similar question in the complex case:

Question: W hich linear operator on a finite dimensional complex inner product space has an orthonormal basis consisting of eigenvectors?

We proceed in the same way as the real case. However, the complex case is not just more complex, it is more tricky. Suppose that an operator T on a finite dimensional complex inner product space V does have an orthonormal basis = e , e ,..., e which E { 1 2 n} are eigenvalues of T , say T e = λ e for j = 1, , . . . , n. The matrix of T relative to is j j j E diagonal: λ1 λ2 [T ] [T ] =  .  . ≡ E ..    λn    30 By what we have seen in the last section (identity (2.1.2)) the matrix of the adjoint T ∗, also relative to , is given by E λ1 λ2 [T ∗] [T ∗] = [T ]∗ =  .  . E ≡ ..    λn    Hence both [T ][T ∗] and [T ∗][T ] are equal to the diagonal matrix with

λ 2(= λ λ = λ λ ), λ 2,..., λ 2 | 1| 1 1 1 1 | 2| | n| as the diagonal entries. Thus we have [TT ∗] = [T ][T ∗] = [T ∗][T ] = [T ∗T ]. That is, the operators TT ∗ and T ∗T have the same matrix representation relative to . Since a linear E operator is completely determined by its matrix representation, we must have TT ∗ = T ∗T . The discussion here leads to

Definition: A is a linear operator T on a complex inner product

space satisfying the identity TT ∗ = T ∗T . Similarly, a is a (complex)

square matrix A satisfying AA∗ = A∗A.

Normal operators include both Hermitian operators and unitary operators. Recall

that an operator H (on a complex inner product space) is Hermitian if H = H∗. If H is a Hermitian operator, then HH∗ and H∗H are equal, because both of them are equal to H2. Hence Hermitian operators are normal. Also recall that an operator U is unitary if

UU ∗ = U ∗U = I. This identity clearly indicates that U is normal. In the same fashion, Hermitian matrices and unitary matrices are normal matrices.

Example 3.2.1. Consider the matrix 1 i i 1 + i i A = with A = . −i 1 i i 1 + i − − − 3 2i Then we can check AA = A A = , showing that A is normal, but is neither ∗ ∗ 2i 3 hermitain nor unitary. −

3.3. The previous discussion establishes that, if an operator T possesses an orthonor mal basis consisting of its eigenvectors, then T is a normal operator. Now we witness a miracle: the converse is also true.

Theorem 3.3.1 A linear operator T on a (finite dimensional) complex inner product space V has a diagonal matrix representation relative to some orthonormal basis if and only if T is normal, that is, TT ∗ = T ∗T .

31 Assume that T is normal. We have to find an orthonormal basis = e , e ,..., e E { 1 2 n} which are eigenvectors of T . Here, n of course is the dimension of V . Our proof proceeds by induction on n. When n = 1, as in the real case, take any unit vector e1 and form the orthonormal basis consisting of the single vector e . This clearly will do for our purpose. E 1 Now we assume the existence of a diagonalizing orthonormal basis for normal operators on spaces of dimension m, and the dimension of the space V on which T (the operator under investigation) is defined is m + 1, that is, n = dim V = m + 1. Let λ0 be an eigenvalue of T and let e be an eigenvector corresponding to λ with norm one, that is, e = 1. Let 0 0 0 M be the one dimensional subspace spanned by e0:

M = x V x = αe for some α C . { ∈ | 0 ∈ } For convenience, let us write T for T λ I. Notice that, “x M” implies “T x = 0”. We 0 − 0 ∈ 0 proceed our proof stepbystep as follows. Firstly, notice that T0 is also normal. Indeed,

T T ∗ = (T λ I)(T ∗ λ I) = TT ∗ λ T ∗ λ T + λ λ I 0 0 − 0 − 0 − 0 − 0 0 0 = T ∗T λ T ∗ λ T + λ λ I = (T ∗ λ I)(T λ I) = T ∗ T . − 0 − 0 0 0 − 0 − 0 λ0 λ0

Secondly, for x M, in addition to T x = 0, we also have T ∗x = 0.(Aside: Attention! ∈ 0 0 This is the crucial step.) Indeed, we have

2 T ∗x = T ∗x,T ∗x = T T ∗x, x = T ∗T x, x = 0, x = 0. 0 0 0 0 0 0 0

Hence T0∗x = 0. Thirdly, we claim that M ⊥ (the orthogonal complement of M) is invariant for both T and T ∗, i.e. if y M ⊥, then both T y and T ∗y belong to M ⊥. To prove this, ∈ let us suppose y M ⊥. Notice that T = T + λI. Hence, for each x M, we have ∈ 0 ∈ T y, x = (T + λI)y, x = T y + λy, x 0 0 = T y, x + λ y, x = y,T ∗x + 0 = y, 0 = 0, 0 0 0

and, in the same fashion, we can show T ∗y, x = 0 for all x M. Therefore both T y and ∈ T ∗y are in M ⊥.

Now the rest of the argument is very similar to that of Theorem 3.1.1. Denote by S

the linear operator on M ⊥ by restricting T to M ⊥, that is, S is the operator defined on the subspace M ⊥ by putting Sx = T x for x M ⊥. The third step above tells us that ∈ S is indeed a linear operator on M ⊥. We check that S (M ⊥) is also normal. To this ∈ L end, we need to find out its adjoint S∗ and show that S and S∗ commutes. So let us take

x, y M ⊥. Then ∈ x,S∗y = Sx, y = T x, y = x,T ∗y . 32 Since x is an arbitrary vector in M ⊥ and since both S∗y and T ∗y are in M ⊥, we must have

S∗y = T ∗y. In other words, S∗ is just the restriction of T ∗ to M ⊥. Hence, for y M ⊥, ∈

SS∗y = S(S∗y) = S(T ∗y) = T (T ∗y) = TT ∗y.

In the same way, we can show that S∗Sy = T ∗T y. As T is normal, TT ∗y = T ∗T y and hence SS∗y = S∗Sy. As y is an arbitrary vector in M ⊥ (the domain of S), we have

SS∗ = S∗S, that is, S is normal.

We have shown that S is a normal operator on M ⊥. From the fact that M is one

dimensional and V has dimension m +1, we see that M ⊥ is mdimensional. Therefore, by our induction hypothesis, M ⊥ has an orthonormal basis consisting of eigenvectors of

S, say e1, e2,..., em. Now, m + 1 vectors e0, e1, e2,...... , em form an orthonormal basis of V consisting of eigenvectors of T .

3.4. The “matrix version” of Theorem 3.3.1 is the following:

Theorem 3.4.1. If A is a normal matrix, i.e. AA∗ = A∗A, then there is a diagonal matrix D and a unitary matrix U such that A = UDU ∗.

The converse of the above theorem is true and very easy to prove (and hence not very interesting). Indeed, if A = UDU ∗ for some diagonal D and some unitary U, then

AA∗ =UDU ∗(UDU ∗)∗ = UDU ∗U ∗∗DU ∗ = UDU ∗UD∗U ∗

= UDD∗U ∗ = UD∗DU = UD∗U ∗U ∗∗DU ∗ = A∗A;

(recall that U ∗∗ = U and UU ∗ = U ∗U = I.)

We can derive Theorem 3.4.1 from Theorem 3.3.1 in the same way as deriving Theorem 3.1.2 from Theorem 3.3.1 except that we work with the standard complex space Cn instead of the real Rn. This same argument will not be repeated here. Recall the following definition of unitary equivalence: we say that n n (complex) matrices A and B are × unitarily equivalent if and only if A = U ∗BU for some unitary matrix U. Now Theorem 3.4.1 can be restated as follows:

An n n complex matrix is unitarily equivalent to a diagonal× matrix if and only if it is a normal matrix.

Since Hermitian operators and unitary operators are normal operators, Theorem 3.3.1 is applicable to these types of operators. Thus, if A is a Hermitian operator (or a uni- tary operator) on a finite dimensional complex inner product space V , then there is an

33 orthonormal basis such that the representing matrix [T ] relative to is diagonal, say E E E

λ1 λ2 [T ] =  .  . E ..    λn    Notice that, the diagonal elements of [T ] are eigenvalues of T . In case T is Hermitian, E the diagonal elements λ are real. In case T is unitary, λ = 1 for all k. k | k| We have shown that a Hermitian matrix has an orthonormal basis consisting of eigen vectors with real eigenvalue. The matrix version of this statement is: a Hermitian matrix A is unitarily equivalent to a real diagonal matrix, that is, there is a real diagonal matrix

D and a untary operator U such that A = UDU ∗; (the converse is also true but not very interesting.)

3.5. A 1 1 complex matrix is simply a complex number. A 1 1 unitary matrix × × is a unit modulus number, that is, a complex number z with z = 1. A 1 1 Hermitain | | × matrix is just a real number. A good way to think of Hermitian matrices or operators is to regard them an extension of real numbers. In the present subsection we study operators and matrices which can be considered as an extension of positive numbers.

Definition. W e say that a linear operator P on a complex inner product space V is positive if P is Hermitain and P x, x 0 for all x in V . ≥ Notice that eigenvalues of positive operators are nonnegative real numbers. Indeed, if λ is an eigenvalue of a positive operator P , say Pv = λv for some vector v with v = 1, then we have P v, v = λv, v = λ v 2 = λ and hence λ 0. ≥ Example 3.5.1. If P is a positive operator on V and if T is any operator on V ,

then the operator T ∗PT is also positive. Indeed, for any vector x in V ,

T ∗PT x, x = PT x,T x = Py, y 0, with y = T x. ≥

In particular, for any operator T on an inner product space, operators T ∗T and TT ∗ are positive. Example 3.5.2. If T is a Hermitian operator on V with nonnegative eigenvalues, then T is positive. Indeed, since T is Hermitain (and hence normal), it follows from Theorem 3.3.1 that there is an orthonormal basis = e , e ,..., e in V consisting E { 1 2 n} eigenvectors of T , say T e = λ e with λ 0 (1 k n). Any vector v can be k k k k ≥ ≤ ≤ 34 written as a linear combination of the basis vectors, say v = k vkek; (here we briefly recall that v = v, e , even this will not be used here). Thus k k

2 T v, v = T vkek , vjej = vkvjλk ek, ej = λk vk 0. k j k,j k | | ≥ Hence T is positive.

Let P be a positive operator on a complex inner product space V . By Theorem 3.3.1 we see that that there is an orthonormal basis = e , e ,..., e in V consisting E { 1 2 n} eigenvectors of T , say T ek = λkek with Theorem 3.3.1 that there is an orthonormal basis = e , e ,..., e in V consisting eigenvectors of T , say T e = λ e with E { 1 2 n} k k k λk 0 (1 k n). In other words, the matrix [P ] representing P relative to is a ≥ ≤ ≤ E E diagonal matrix with λ 0 (1 k n) as its diagonal elements. Now let Q be the k ≥ ≤ ≤ operator such that its matrix representation [Q] is a diagonal matrix with with √λk E (1 k n) as its diagonal elements. Thus ≤ ≤

λ1 √λ1 λ2 √λ2 [P ] =  .  and [Q] =  .  . E .. E ..      λn   √λn      Clearly [Q]2 = [P ] . Hence we have Q2 = P . From Example 3.5.2 above we know that E E Q is positive. We have proved the existence part of the following theorem

Theorem 3.5.1. If P is a positive operator, then there exists a unique positive operator Q such that Q2 = P.

The Proof of the uniqueness of Q is rather technical and hence is omitted here. The operator Q in the above theorem is called the square root of P and is denoted by P 1/2 or √P .

Take any operator T on an inner product space V . According to Example 3.5.1 above, T ∗T is a positive operator and hence its square root is defined. We denote √T ∗T by T . Thus, T is a positive operator satisfying | | | |

2 T = T ∗T. (3.5.1) | |

In general, T and T ∗ are not the same. | | | |

Example 3.5.3. Prove that T = T ∗ if and only if T is normal. | | | | 35 2 2 2 2 Solution. Suppose T = T ∗ . Then T = T ∗ . Now T = T ∗T and T ∗ = | | | | | | | | | | | | T ∗(T ∗)∗ = TT ∗. Hence T ∗T = TT ∗, that is, T is normal. The steps can be reversed to show that, if T is normal, then T ∗ = T . | | | | The eigenvalues of T , arranged in the decreasing order, say , are | | 1 ≥ 2 ≥ ≥ n called singular values, or s-numbers of T . They are important in many areas, but we do not plan to say more about this.

Take any complex n n matrix A = [a ] and consider the linear operator M × jk A induced by A, defined on the complex space Cn with the standard inner product. If

MA is a positive operator, we say that A is positive semi-definite. If, furthermore, MA is invertible, we say that A is positive definite. It can be easily checked that, if v = (z1, z2, . . . , zn), then

MAv, v = v∗Av = ajkzkzj. k,j Thus a Hermitain matrix A = [ajk] is positive semi–definite if and only if

ajkzkzj 0. k,j ≥ All of the above discussion about operators can be applied to matrices. For example, for

any matrix A, A∗A is positive semi–definite and hence A = √A A exists | | ∗

Example 3.5.4. Take any set of vectors v1, v2,..., vr in an inner product space and let G = [g ] be the r r matrix with g = v , v . Check that A is positive jk × jk j k semi–definite.

Solution. For all comples numbers z1, z2, . . . , zr, we have r r r gjkzjzk = vj, vk zjzk = ajk zjvj, zkvk j,k= 1 j,k= 1 j,k= 1 r r = zjvj, zkvk = w, w 0, j= 1 k= 1 ≥ r where w = j= 1zjvj. A matrix of the form described here is called a . 3.6.* Let T be an operator on a complex inner product space and, as before, write T = √T T . Let us check that T and T have the same kernel: | | ∗ | | ker T = ker T (3.6.1) | | Indeed, for any vector v, we have

2 2 2 T v = T v,T v = T ∗T v, v = T v, v = T v, T v = T v , (3.6.2) | | | | | | | | 36 which tells us that T v = 0 if and only if T v = 0. | | Now assume that T is invertible. From (3.6.1) we know that T is also invertible. 1 1 1 | 1| 2 1 Let U = T T − . Then T = U T , U ∗U = ( T − T ∗)T T − ) = T − T T − = I and | | | | | | | | | | | | | | 1 1 2 1 1 1 1 UU ∗ = (T T − )( T − T ∗) = T ( T )− T ∗ = T (T ∗T )− T ∗ = T (T − (T ∗)− )T ∗ = I, | | | | | | showing that U is unitary. We have proved that T can be written as a product UP of a unitary operator U and a positive operator P . Now we check that U and P are uniquely 2 determined by T . In fact, from T = UP we have P = PU ∗UP = (UP)∗(UP) = T ∗T = T 2. By the uniqueness of positive square root, we have P = T , from which we also | | 1 1 | | have U = TP − = T T − . The expression UP here is called the | | of T . In the one–dimensional case, we can identify T with a complex number z and the polar representation z = reiθ of z corresponds to the polar decomposition of T . There is a matrix version for polar decomposition defined in the same manner. The polar decomposition of an n n matrix A is A = U A , where A = √A A and U is a × | | | | ∗ unitary matrix.

3 3i Exampe 3.6.1. Find the polar decomposition for A = . i 1 10 8i Solution. Direct computation shows A A = with eigenlues λ = 18, ∗ 8i 10 1 − λ = 2 and corresponding eigenvectors v = (i, 1) and v = (1, i). Hence A = √A A 2 1 2 | | ∗ has the eigenvalues √λ1 = 3√2, √λ2 = √2 with the same set of eigenvectors. Let

3√2 0 1 i 1 D = and W = . 0 √2 √2 1 i 2 Then (A∗A)W = WD , A W = WD and U is unitary. A direct computation shows | |

2 i 1 1 1 i A = WDW ∗ = √2 and U = A A − = . | | i 2 | | √2 i 1 − Hence the required polar decomposition is A = U A , with U and A as given above. | | | | Now we breifly describe the polar decomposition for an operator T not necessarily invertible. Equation (3.6.2) tells us that T w = T w for all w. Since T is Hermitian, | | | | its range is the orthogonal complement of its kernel. We define an operator U by specifying its values for a vector v in the range or in the kernel of T . When v is in the kernel of T , | | | | we simply set Uv = 0. When v is in the range of T , say v = T w, we let Uv = T w. | | | | Notice that Uv = T w = T w = v . | | 37 This shows that, on the range of T , U is isometric. From the way U is defined, we have | | T = U T and ker U = ker T . Here U in general is not unitary since it may not be | | | | invertible. However, it resembles a unitary operator, in view of the following identities which can be verified: UU ∗U = U, U ∗UU ∗ = U ∗.

Let e , e ,..., e be an orthonormal basis consisting of eigenvectors of T and let 1 2 n | | , , . . . , be corresponding eigenvalues of T , which are singular values of T : 1 2 n | | . 1 ≥ 2 ≥ ≥ n So we have T e = e (1 k n). Let r be the rank of T , which is also the rank of | | k k k ≤ ≤ T . Thus we have > 0 and = = = 0. Since the vectors | | 1 ≥ 2 ≥ ≥ r r+ 1 n e , e ,...,..., e are in the range of T , and the operator U is isometric on the range 1 2 r | | of T , the vectors | | f = Ue 1 k r k k ≤ ≤ form an orthonormal system. One can check that

r T x = k x, ek fk (3.6.3) k= 1 for any vector x. The above identity is called the singular decomposition of T .

1 i Example 3.6.2. Find the singular decomposition of A = . 1 i 2 2 2i Solution. Direct computation shows A = A∗A = with eigenvalues | | 2i 2 − 4 and 0. Furthermore, e = (i/√2, 1/√2) is an eigenvector of A 2 corresponding to | | the eigenvalue 4 with e = 1. From A 2e = 4e we have A e = 2e. Furthermore, we | | 1 | | have Ae = U A e = U(2e) = 2Ue = 2f. Thus f = 2− Ae = (i/√2, i/√2). Thus the | | singular decomposition of A is given by Ax = 2 x, e f with e = (i/√2, 1/√2) and f = (i/√2, i/√2).

38 EXERCISE SET IV. 3.

Review Questions. What is the orthogonal diagonalization problem? What is the neat and thorough answer to this problem in the real case? In the complex case? What is the matrix version of this problem and what is the corresponding answer? (Again, you have to consider the real case and the complex case separately.)

Drills

1. Show that each of the following pairs of matrices A and B are unitarily equivalent by

finding a unitary matrix U such that B = U ∗AU. a b d c a b a b (a) A = , B = . (b) A = , B = . c d b a c d c− d − 0 a 0 a 1 1 0 2 (c) A = , B = . (d) A = , B = . 0 0 0| 0| 1 1 0 0 − − Hint for (c): try a diagonal U. Hint for (d): try the 45o rotation.

a b 2. (a) Verify that N = is normal if and only if b = c . c a | | | | (b) Show that the given in Exercise 4 of EXERCISE SET III.1 is normal. 3. True or false: (a) If an n n matrix A is both Hermitian and unitary, then A = I. × (b) The sum of two normal operators is normal. (c) The product of two normal operators is normal. (d) If a normal operator is invertible, then its inverse is also normal. (e) The sum of two unitary operators is unitary. (f) The product of two unitary operators is unitary. (g) The sum of two Hermitian operators is Hermitian. (h) The product of two Hermitian operators is Hermitian.

4. (Aside: It is clear that unitary equivalence implies similarity, but not vice versa. The present exercise helps you to compare these two concepts.)

(a) Show that, if A and B are unitarily equivalent n n matrices, then A∗A and × B∗B are also unitarily equivalent. (b) Give an example of a pair of 2 2 matrices which are similar but not unitarily × equivalent.

(c) Give an example of similar 2 2 matrices A and B such that A∗A and B∗B are × not similar.

39 (d) Prove that if normal matrices A and B are similar, then they are unitarily equivalent. 5. For each of the following Hermitian matrices, find the eigenvalues and correspond ing eigenvectors and find an appropriate diagonaling unitary matrix (or orthogonal matrix):

1 1 2 1 1 2 0 i A = ,B = ,C = ,D = , 1 1 1 2 2 2 i −0 − 1 1 1 0 1 0 S = 1 1 1 ,T = 1 0 1 .  1 1 1   0 1 0      6. For each of the following matrices, find the eigenvalues and corresponding eigenvectors and find an appropriate diagonaling unitary matrix (or orthogonal matrix):

cos θ sin θ cos θ sin θ cos θ i sin θ A = ,B = ,W = , sin θ cos θ sin θ − cos θ i sin θ cos θ − where θ is an arbitrary real number such that sin θ > 0.

Exercises

1. Find an orthogonal matrix P and a diagonal matrix D such that D = P AP ⊤, where

1 1 0 A = 1 0 1 .  0 1 1    (Hint: Notice that (1, 1, 1) is an eigenvector of A.) 2. Follow the guidance given here, show that the nth term of the Fibonacci sequence a in which each term is is the sum of the preceding two, with the first few terms { n} 1, 1, 2, 3, 5, 8,..., is given by

n n 1 1 + √5 1 1 √5 an = − . √5 2 − √5 2

(a) Use the recursive relation an+ 2 = an+ 1 + an to verify Xn+ 1 = AXn, where

1 1 a 1 A = ,X = n+ 1 , and X = , 1 0 n a 0 0 n 40 n and derive Xn = A X0. (b) Notice that A is a real symmetric matrix and hence we 1 can write A = PDP − where P is an invertible matrix and D is a diagonal matrix. n Find the explicit expressions of D and P , which enable us to get A and hence Xn. 3*. Let V be a finite dimensional complex inner product space. Prove that If N is a

normal operator on V and if TN = NT for some T (V ), then TN ∗ = N ∗T . ∈ L 4*. Prove that if P is an invertible positive operator on an inner product space V , then the inequality 2 1 x, y P x, x P − y, y | | ≤ holds for all x, y in V . 5*. Prove that if H is a Hermitian operator on a finite dimensional inner product space V , then eitH is a unitary operator for all real t. 6*. Prove that an operator H on a finite dimensional inner product space V is a Hermitian operator if eitH is unitary for all real t.

41 Appendices for Chapter IV

Appendix A*: Positive Semidefiniteness, Gram Matrices

Recall that a matrix A = [ajk]1 j,k r is positive semidefinite if, for all complex ≤ ≤ numbers z1, z2, . . . , zr, r ajkzjzk 0. j,k= 1 ≥ A positive semidefinite matrix is necessarily Hermitian and its eigenvalues are nonnegative real numbers. It follows from the (for normal operators) that it is a sum of positive semidefinite matrices of rank one, which are necessarily of the form

v v v v v v 1 1 1 2 1 r v2v1 v2v2 v1vr  . . .  . (A1) . . .    vrv1 vrv2 vrvr    An interesting consequence of this observation is that, if A = [ajk] and B = [bjk] are positive semidefinte matrices, then so is their Schur product A B = [a b ] (certainly ◦ jk jk this is not the usual kind of ). Indeed, the above discussion tells us that it is enough to consider the case when B is the matrix given as (A1) above. In that case, we have

r r r ajkbjkzjzk = ajkvjvkzjzk = ajkwjwk 0, j,k= 1 j,k= 1 j,k= 1 ≥ where wj = vjzj.

Let v1, v2,..., vr be a set of vectors in an inner product space V . By the Gram matrix associated with this set of vectors we mean the following r r matrix × v , v v , v v , v 1 1 1 2 1 r v2, v1 v2, v2 v2, vr Γ =  . . .  . (A2) . . .    vr, v1 vr, v2 vr, vr    Its determinant Gr = det Γ is called the Gramian. Notice that Γ is positive semidef inite. Indeed, since the (j, k)–entry of Γ is v , v , for arbitrary complex numbers j k z1, z2, . . . , zr, we have

r r 2 vj, vk zjzk = zjvj, zkvk = zjvj, zkvk = zjvj 0. j k j ≥ j,k= 1 j,k= 1 42 Furthermore, this argument shows that Γ is positive definite if and only if the vectors v1, v2,..., vr are linearly independent. In terms of Gramian, the vectors v1, v2,..., vr are linearly independent if and only if G det Γ > 0. Conversely, given a positive r ≡ definite matrix A = [ajk]1 j,k r, there exist an inner product space and a set of vectors ≤ ≤ v , v ,..., v in V such that a = v , v for all j, k; in other words, every positive 1 2 r jk j k definite matrix can be regarded as a Gram matrix. Indeed, given such a matrix A, we define an inner product on V = Cr by putting

r v, w = ajkvjwk j,k= 1 r for all v = (v1, . . . , vr) and w = (w1, . . . , wr) in C . It is straightforword to check that this indeed defines an inner product and a = e , e , where e is the kth vector jk j k k of the standard basis for Cr.

Now we consider an interesting expression which resembles the Gramian det Γ, that is, the determinant of Γ given in (A2),

v1, v1 v1, v2 v1, vr 1 v1 − v2, v1 v2, v2 v2, vr 1 v2 g = . . . − . . (A3) . . . . . . . . vr, v1 vr, v2 vr, vr 1 vr − Notice that the last column of the above determinant are vectors v1, v2,..., vr. If we take the cofactor expansion along the last column, we can write g as a linear combination of v1, v2,..., vr with

v1, v1 v1, v2 v1, vr 1 − v2, v1 v2, v2 v2, vr 1 − Gr 1 = . . . − . . . . . . vr 1, v1 vr 1, v2 vr 1, vr 1 − − − − as the coefficient of vr. Thus we may write

g = Gr 1vr + w (A4), −

with w in S = span v1,..., vr 1 . Now, for each vk with 1 k r 1, we have { − } ≤ ≤ −

v1, v1 v1, v2 v1, vr 1 v1, vk − v2, v1 v2, v2 v2, vr 1 v2, vk − g, vk = . . . . = 0 . . . . . . . . vr, v1 vr, v2 vr, vr 1 vr, vk − 43 because the last column is the same as the kth column. This shows g S⊥. Assume that ∈ v1, v2,..., vr 1 are linearly independent so that Gr 1 = 0. Rewrite (A4) as − − 1 1 vr = h + p, where h = Gr− 1g S⊥ and p = Gr− 1w S. − ∈ − ∈

Thus p is the projection of vr onto the subspace spanned by v1, v2,..., vr 1. − Example. Find the projection of v = (0, 0, 1) onto the subspace spanned by the vectors v1 = (1, 1, 1) and v2 = (1, 2, 2). Solution. Form the vector

v1, v1 v1, v2 v1 3 5 v1 g = v , v v , v v = 5 9 v = v v + 2v. 2 1 2 2 2 2 1 − 2 v, v v, v v 1 2 v 1 2 So v = 1 g 1 v + 1 v . The required projection is p = 1 v + 1 v = 0, 1 , 1 . 2 − 2 1 2 2 − 2 1 2 2 2 − 2

Appendix B*: Numerical Characters of operators: Norm, , Etc.

For a linear operator T on a finite dimensional inner product space V , the uniform norm, or simply the norm, of T is defined to be

T = max T x : x V, x = 1 { ∈ } The following basic properties about the norm hold: 1. T 0 and T = 0 if and ≥ only if T = O; 2. S + T S + T ; 3. aT = a T ; 4. ST S T ; 5. ≤ | | ≤ T v T v ; 6. T ∗ = T . The last equality follows from the following observation: ≤ T = max T x, y : x, y V, x = y = 1 . {| | ∈ }

A less trivial property is the following “C∗–identity”:

2 T ∗T = T . 2 Indeed, T ∗T T ∗ T = T , and, on the other hand, from ≤ 2 2 T x = T x,T x = T ∗T x, x T ∗T x x T ∗T x ≤ ≤ 2 2 2 we have T = max x = 1 T x T ∗T . Hence T ∗T = T . ≤ 44 One purpose of introducing the notion of norm is to study convergence of operators.

We say that a sequence of operators Tn converges to T if limn Tn T = 0. { } → ∞ − Also, we say that a series of operators n∞ = 0Tn converges if the sequence of its partial sums S = T + T + + T converges. For example, it can be checked that, if T < 1, n 0 1 n n 1 then I T is invertible and the series ∞ T converges to (I T )− . − n= 0 − Recall that the spectrum σ(T ) of Tis the set of all eigenvalues of T . The spectral radius of T is defined to be

r(T ) = max λ : λ σ(T ) . {| | ∈ } It is easy to see that r(T ) T . Indeed, we can choose an eigenvalue λ such that ≤ λ = r(T ), and, letting v be a unit vector such that T v = λv, we have | | r(T ) = λ = λv = T v T . | | ≤ Notice that r(T ) = 0 if and only if 0 is the only eigenvalue of T , or, equivalently, T is a nilpotent operator. When S and T commutes, that is, ST = TS, we have the inequalities r(S + T ) r(S) + r(T ) and r(ST ) r(S)r(T ). But in general, without ≤ ≤ this commutativity condition, these two inequalities are not true. We have the following important identity for spectral radius

n 1/n r(T ) = limn T . → ∞ The usual proof of this uses . The set

W(T ) = max T x, x : x = 1 { } is called the of T . It is true but highly nontrivial that W(T ) is always a in the . The number w(T ) = max λ : λ W(T ) is called {| | ∈ } the numerical radius of T . For all operators T , we have

r(T ) w(T ) T 2w(T ). ≤ ≤ ≤

Given operators S and T , thier Hilbert–Schmidt inner product is defined to be n S, T HS = Sej,T ej j= 1 where ej 1 j n is an orthonormal basis of V . It can be checked that S,T HS given { } ≤ ≤ here is independent of the choice of the orthonormal basis ej 1 j n and hence it is well { } ≤ ≤ defined. The Hilbert–Schmidt norm of an operator T , is defined to be

n 2 T HS = T,T = T ek . HS k= 1 45 Let be the singular values of T , that is, the eigenvalues of T , 1 ≥ 2 ≥ ≥ n | | arranged in the decreasing order. Then T = and T 2 = n 2 . For any 1 HS k= 1 k positive number p 1, one can define the p–norm of T to be T by putting ≥ p 1/p n p T p = . k= 1 k Then T = T , that is, the Hilbert–Schmidt norm is just the 2–norm. The following HS 2 properties about the p–norm are highly nontrivial: 1. S+T S + T ; 2. UT = p ≤ p p p TU = T if U is unitary (or orthogonal in the real case); 3. ST S T p p p ≤ p and ST S T ; 4. T T T ; 5. T ∗ = T . The norm T is p ≤ p ≤ p ≤ 1 p p 1 called the trace norm of T . Notice that T = tr T , the trace of T . 1 | | | | We can use an inner product space V and operators on V to model some quantum system. A pure state of this system is a unit vector in V . An is an operator on V . An eigenstate for an observable A is an eigenvector of A and the corresponding eigenvalue is the observed value of A at that state, as measured in a lab. If v is not an eigenstate for A, then Av, v is the (the word “expected” is in probabilistic sense) of A at state v. By a “mixed” state we mean a positive operator T with T tr T = 1. If , , . . . , are eigenvalues of T with corresponding 1 ≡ 1 2 n eigenvectors v1, v2,..., vn which form an orthonormal basis, then

n AT, T HS = k Avk, vk , k= 1 which is the expected value of the observable A at the mixed state T . A pure state v can be identified with the rank one positive operator T defined by T x = x, v v. Notice that v =1 implies tr T = 1. A mixed state is a convex combination of a set of pure states.

Appendix C*: Linear Groups

By a linear here we mean a group of linear operators (another word for this is linear transformations), not a group that is linear. Let V be a vector space with dim V = n, Recall that (V ) is the set of all linear operators on V . We say that a subset L G of (V ) is a linear group or simply a group if it satisfies the following conditions: L (LG1) The identity transformation I belongs to G. (LG2) G is closed under multiplication, that is, if S and T are in G, then so is their product ST .

46 1 (LG3) If S is in G, then S is invertible and its inverse S− is also in G.

For example, all invertible elements in (V ) form a group denoted by GL(V ), called the L general linear group. When the vector space V is Fn, we may identify GL(V ) with the group GL(n; F) of all invertible n n matrices over F. A subgroup of GL(n; F) × is called a matrix group. For example, all orthogonal (real) n n matrices form a group × denoted by O(n), called the . Notice that, for A O(n), we have 2 ∈ det(AA⊤) = det(I) = 1, or (det A) = 1 and hence det A is either 1 or 1. The subgroup − of O(n) consisting of orthogonal matrices of determinant 1, called the special orthogonal group, is denoted by SO(n), All unitary n n matrices form a subgroup of G(n, C), × denoted by U(n), called the . The is defined to be SU(n) = A U(n): det A = 1 { ∈ } A GL(n, C): AA∗ = A∗A = I, det A = 1 . ≡ { ∈ } This group is important in several areas, including particle physics.

Let G be a group of n n invertible matrices. An n n matrix A is called a tangent × × vector of G at I if A = Φ′(0) for some smooth curve Φ(t) in G satisfying Φ(0) = I. Denote by LG the set of all tangent vectors of G at I.

Now we derive some basic properties of LG. First, take arbitrary A and B in LG. We

claim: A + B is also in LG. Indeed, by assumption, we have Φ′(0) = A and Ψ′(0) = B for some parametric curves Φ(t) and Ψ(t) in G with Φ(0) = Ψ(0) = I. Then Θ(t) Φ(t)Ψ(t) ≡ is also a parametric curve in G with Θ(0) = Φ(0)Ψ(0) = I. So Θ′(0) is in LG. The product rule gives

Θ′(0) = Φ(0)Ψ′(0) + Φ′(0)Ψ(0) = I.B + A.I = A + B.

Hence A + B is in LG. Next we claim: if A is in LG, say A = Φ′(0) for a parametric curve Φ(t) with Φ(0) = I, and if λ is a scalar, then λA is also in LG. Putting the last two claims together, we see that LG is a vector space. Indeed, consider the new parametric curve Ψ(t) = Φ(λt), which is also lying in G. Clearly Ψ(0) = I and, by the chain rule,

Ψ′(t) = λΦ′(λt) and consequently λA = λΦ′(0) = Ψ′(0) LG. Now we make the third 1 ∈ claim: if A LG and B G, then BAB− LG. Indeed, from A LG we know that ∈ ∈ ∈ ∈ 1 A = Φ′(0) for some curve Φ(t) in G satisfying Φ(0) = I. Let Ψ(t) = BΦ(t)B− , which 1 1 is a parametric curve in G satisfying Ψ(0) = I with Ψ′(0) = BΦ′(0)B− = BAB− and 1 hence BAB− LG. The final claim is: if A, B are in LG, then so is AB BA. Indeed, ∈ − we have Ψ′(0) = B for some parametric curve in Ψ(t) in G with Ψ(0) = I. By our third 1 claim, we know that C(t) = Ψ(t)− AΨ(t) is a parametric curve in LG. Now our first and second claims say that LG is a linear space of n n matrices. Hence the derivative C′(t) × 47 of C(t), a curve in LG, is also in LG. In particular, C′(0) is in LG. Now

1 1 1 1 1 1 C′ = (Φ− AΦ)′ = (Φ− )′AΦ + Φ− AΦ′ = Φ− Φ′Φ− AΦ + Φ− AΦ′; −

1 1 (here we have used (Φ− )′ = Φ− Φ′Φ). From Φ(0) = I and Φ′(0) = A, we obtain − C′(0) = BA + AB = AB BA. The expression AB BA is called the Lie − − − or Lie product of A and B, and is denoted by [A, B]. We call a set L of n n matrices × a real (matrix) if, for all A and B in L and for all real numbers λ and , both λA + B and [A, B] are in L. We have arrived at the following fact: If G is a matrix group, then LG is a Lie algebra. Naturally we call LG the Lie algebra of G.

The description of tangent vectors of G at I is not easy to work with in concrete cases. The following criteria is handy: a matrix A is in LG if and only if etA G for all t. Using ∈ this criterion, we can easily find out the Lie of matrix groups mentioned above. Writing M (F) for the set of all n n matrices with entries in F, we have n ×

L O(n) = A M (R): A + A⊤ = O ; { ∈ n } L SO(n) = A M (R): A + A⊤ = O, tr A = 0 ; { ∈ n } L U(n) = A M (C): A + A∗ = O ; { ∈ n } L SU(n) = A M (C): A + A∗ = O, tr A = 0 . { ∈ n }

Appendix D*: Rotations

By a rotation we mean an element A in the matrix group SO(3); in other words,

A is a 3 3 real matrix with AA⊤ = A⊤A = I and det A = 1. Since A is a 3 3 real × × matrix, its charateristic polynomial p(x) is a real polynomial of (odd) degree 3. Thus p(x) must have a real root, say r, and the other two are either both real or a conjugate pair. Let v be an eigenvector corresponding to r, that is, Av = rv, v = 0. Now v = Av = rv = r v and hence r = 1. so r is either 1 or 1. In case the other | | | | − two eigenvalues form a conjugate pair, say λ and λ, we have

1 = det A = rλλ = r λ 2 | | which implies r > 0 and hence r = 1. If the other two eigenvalues are also real, say r , r R, then we also have r = 1 and r = 1. Furthermore, 1 = det A = rr r and 1 2 ∈ | 1| | 2| 1 2 hence one of r, r1, r2 is positive. Thus we have shown that 1 is always an eigenvalue of a

48 1 A. Let v1 be a unit vector such that Av1 = v1. Applying A− to 1 1 both sides, we get v = A− v , that is, A− v = v . Let S = v ⊥, the orthogonal 1 1 1 1 { 1} complement of v . Notice that, if v S, then 1 ∈ 1 Av, v = v,A⊤v = v,A− v = v, v = 0 1 1 1 1 and hence Av S. This shows that S is an invariant subspace of A. The restriction of A ∈ to S, say AS, is necessary an orthogonal operator with determinant 1. Thus, if vectors v2, v3 form an orthonormal basis of the 2–dimensional space S, the matrix representation of AS relative to this basis is necessarily of the form

cos θ sin θ sin θ −cos θ a rotation matrix with θ as its of rotation. Relative to the orthnormal basis = B v , v , v , the representation matrix of A is given by { 1 2 3} 1 0 0 [A] = 0 cos θ sin θ . (C1) B  0 sin θ −cos θ    Geometrically, vector v1 gives the direction of the axis of rotation and θ is the angle of rotation. The connection between A and [A] is a matter of . Let B 1 V = [v1 v2 v3], an orthogonal matrix. Then A = V [A] V − . Taking traces on both B sides, we get 1 tr A = tr V [A] V − = tr [A] = 1 + 2 cos θ (C2) B B This gives us the recipe for finding the angle of rotation. Next we describe a way to find 1 the axis of rotation. As we have seen, Av1 = v1 and A⊤v1 = A− v1 = v1. Hence (A A⊤)v = 0. But C = A A⊤ is a skew symmetric matrix, that is, C⊤ = C. We − 1 − − can put C in the following form

0 a a − 3 2 C = a3 0 a1 ;  a a −0  − 2 1   (see the final part of 2.4 in Chapter I). Since Cx = a x, where a = (a1, a2, a3), we have § 1 × Ca = 0. We can set v = a − a. 1 Example. Find the angle and the axis of the “rotation sequence”

cos α sin α 0 1 0 0 cos α sin α cos β sin α sin β R = sin α −cos α 0 0 cos β sin β sin α −cos α cos β cos α sin β .  0 0 1   0 sin β −cos β  ≡  0 sin β − cos β        49 Solution. Denote by θ the angle of rotation R. Then

1 + 2 cos θ = tr R = cos α + cos α + cos β = (1 + cos α)(1 + cos β) 1 − and hence cos θ = 1 (1 + cos α)(1 + cos β) 1 from which θ can be obtained. Form the 2 − skew symmetric matrix (for simplicity, we do not specify the lower left part of this matrix)

0 sin α(1 + cos β) sin α sin β − R R⊤ = 0 sin β(1 + cos α) . −  ∗ − 0  ∗ ∗   The axis of rotation is to the vector

v = (sin β(1 + cos α), sin α sin β, sin α(1 + cos β)).

A brute force computation shows Rv = v. We remark that computing rotation sequences is useful in some practical problems, such as navigation.

Appendix E*: SU(2), , and

Recall that SU(2) is the matrix group of all 2 2 unitary matrices of determinant × equal to 1: SU(2) = U U(2): det(U) = 1 . { ∈ }

Let U be in SU(2). Write down U and UU ∗ explicitly as follows

z w z w z u z 2 + w 2 zu + wv U = and UU = = . u v ∗ u v w v | u|z + |vw| u 2 + v 2 | | | | 2 2 From UU ∗ = I we get z + w = 1 and uz + vw = 0. Assume w = 0 and z = 0. | | | | Then we may write u = αw and v = βz for some α and β. Now uz + vw = 0 gives (α + β)zw = 0 and hence α + β = 0. Thus

1 = det(U) = zv wu = z(βz) w(αw) − − = z(βz) w( βw) = β( z 2 + w 2) = β. − − | | | | Therefore U is of the form

z w U = , where z 2 + w 2 zz +ww = 1. (E.1) w z | | | | ≡ − 50 In case z = 0 or w = 0, U has the same form (please check this). We conclude: a 2 2 × matrix U is in SU(2) if and only if it can be expressed at (E.1) above.

Writing z = x0 + ix1 and w = x1 + ix2 in (E.1), we have

z w x + ix x + ix U = = 0 1 2 3 = x 1 + x i + x j + x k, (E.2) w z x + ix x ix 0 1 2 3 − − 2 3 0 − 1 where 1 0 i 0 0 1 0 i 1 = , i = , j = , k = . (E.3) 0 1 0 i 1 0 i 0 − − Matrix U in (E.1) belongs to SU(2) if and only if

z 2 + w 2 x2 + x2 + x2 + x2 = 1. | | | | ≡ 0 1 2 3 2 2 2 2 An expression written as the RHS of (E.2), without the condition x0 + x1 + x2 + x3 = 1 imposed, is called a . Since the theory of quaternions was discovered by Hamilton, we denote the collection of all quaternions by H. The algebra of quaternions is determined by the following identities among basic units 1, i, j, k:

1q = q1 = q, i2 = j2 = k2 = 1, ij = ji = k, jk = kj = i, ki = ik = j, (E.4) − − − − where q is any quaternion. These identities can be checked by direct computation. We usually suppress the unit 1 of the quaternion algebra H and write x0 for x01. Let q be the quaternion given as (E.2), which is a 2 2 complex matrix. Its adjoint is given by ×

z w x0 ix1 x2 ix3 q∗ = − = − − − = x0 x1i x2j x3k, w z x2 ix3 x0 + ix1 − − − − which is also called the conjugate of q. A direct computation shows

2 2 2 2 q∗q = qq∗ = ( z + w )1 z + w | | | | ≡ | | | | 2 2 2 2 = det(q) = x0 + x1 + x2 + x3.

The square root of the last expression is called the norm of q and is denoted by q . Thus 2 q∗q = qq∗ = q . So, q is in SU(2) if and only if q = 1: SU(2) = q = x + x i + x j + x k H: q 2 x2 + x2 + x2 + x2 = 1 . { 0 1 2 3 ∈ ≡ 0 1 2 3 }

Regarding H as the 4dimensional space with rectangular coordinates x0, x1, x2, x3, we 2 2 2 2 may identity SU(2) is the 3dimensional sphere x0 + x1 + x2 + x3 = 1, which will be

51 simply called the 3-sphere. Notice that, if we write z = x0 + x1i and w = x2 + x3i, then q = x0 + x1i + x2j + x3k can be written as q = z + wj, in view of ij = k.

For a quaternion q = x0 + x1i + x2j + x3k, we often write q = x0 + x, where x0 is

called the scalar part and x = x1i + x2j + x3k is called the vector part. From (E.4) we see how to multiply “pure vector” quaternions. It is easy to check that the product of two quaternions q = x0 + x and r = y0 + y is determined by

qr = (x0 + x)(y0 + y) = x0y0 + x0y + y0x + xy, where (E.5) xy = x y + x y. − ×

The “scalar plus vector” decomposition q = x0 + x of a quaternion is also convenient for deciding its conjugate, as we can easily check that

q∗ = (x + x)∗ = x x, (E.6) 0 0 − which resembles the identity x + iy = x iy for complex numbers. From (E.6) we see that − a quaternion q is a pure vector if and only if q∗ = q, that is, q is skew Hermitian. − We identify a pure vector x = x1i + x2j + x3k with the vector x = (x1, x2, x3) in R3. For each q SU(2), define a linear transformation R(q) in R3 by putting ∈ R(q)x = q∗xq. (the definition of R(q) here comes from the adjoint representation of a matrix group, which is SU(2) in the present case, described in Appendix C above.) We can check that y R(q)x is indeed in R3: ≡

y∗ = (R(q)x)∗ = (q∗xq)∗ = q∗x∗q = q∗( x)q = q∗xq = y. − − − The most interesting thing about R(q) is that it is an : x and y R(q)x have ≡ the same length. Indeed,

2 y = y∗y = (q∗xq)∗(q∗xq) 2 2 2 = q∗x∗qq∗xq = q∗x∗xq = q∗ x q = x q∗q = x . Using some connectedness argument in , one can show that R(q) is actually a rotation (not a reflection) in 3–space. It turns out that every rotation in 3–space can be written in the form R(q) and we call it the representation of the rotation. Also, we call SU(2) the spinor group. It is an essential mathematical device for describing electron spin, and studying aircraft stability. It is also used to explain how a falling cat can turn its body 180o in the midair in order to achieve a safe landing, without violating the basic physical law of conservation of angular momentum.

52