<<

Chapter 6: Complex Matrices

We assume that the reader has some experience with matrices and . We can easily extend basic theory of by allowing taking complex numbers as entries. However, we should pay more attention to features unique to complex matrices, especially the notion of adjoint, which is the matrix version of .

Among the first things we learned from linear algebra is the intimate relation between matrices and linear mappings. To describe this relation within our convention, we need to identify each vector in Cn as a column, that is, as an n 1 matrix. Thus a vector in Cn, × say x = (x1, x2, . . . , xn), will be considered as the same as

x1 x2 x =  .  [x1 x2 xn]⊤. . ≡    xn    We are safeguarded from confusion by different types of brackets. From now on, let us adopt the following rule: things in a row surrounded by the round brackets “(” and “)” is the same things arranged in a column surrounded by the brackets “[” and “]”, e.g. dog (dog, cat) = . We have the following cat “Matrix Representation Theorem” A map T from Cn to Cm is linear if and only if there exists an m n matrix A such that T x = Ax for all x Cn. × ∈ Furthermore, the matrix A here is uniquely determined by T . (Recall that a mapping T from Cn to Cm (we write T : Cn Cm) is linear if the following identity holds → for all vectors x, y in Cn and all scalars α, β: T (αx + βy) = αT x + βT y.)

Given a complex matrix A, we define the adjoint of A, denoted by A∗, to be the conjugate of A. In other words, A∗ is obtained by taking complex conjugate of

all entries of A, followed by taking the transpose: A∗ = A⊤. Thus

a11 a12 a1n a11 a21 an1 a a a a a a 21 22 2n 12 22 n2 A =   = A∗ =   . ⇒          am1 am2 amn   a1m a2m anm      1 As we have mentioned, the adjoint is the matrix version of the complex conjugate.

Example 6.1. Regarding a vector v = (a , a , , a ] in Cn as a matrix, we have 1 2 n a a a a a a a1 1 1 1 2 1 n a a a a a a a2 2 1 2 2 2 n v =  .  , v∗ = [a1 a2 an], vv∗ =   , .    an       ana1 ana2 anan      2 2 2 and v∗v = a + a + + a = v, v . | 1| | 2| | n| For n n matrices A and B, and for a α, we have ×

(A + B)∗ = A∗ + B∗, (αA)∗ = aA∗ (AB)∗ = B∗A∗

The last identity tells us that in general (AB)∗ = A∗B∗ is false.

The following identity is the most basic feature concerning the adjoint of a matrix: for every n n matrix A, and all vectors x, y in the complex Cn, we have ×

Ax, y = x,A∗y

We check this identity only for 2 2 matrices. Suppose × a a x y A = 11 12 , x = 1 y = 1 . a a x y 21 22 2 2 Then a11x1 + a12x2 a11y1 + a21y2 Ax = and A∗y = . a x + a x a y + a y 21 1 22 2 12 1 22 2 So Ax, y = a x y + a x y + a x y + a x y and x,A∗y = x a y + x a y + 11 1 1 12 2 1 21 1 2 22 2 2 1 11 1 1 21 2 x a y + x a y . Comparing them, we see that Ax, y = x,A∗y . 2 12 1 2 22 2

We say that an n n matrix is self–adjoint or Hermitian if A∗ = A. The last × identity can be regarded as the matrix version of z = z. So being Hermitian is the matrix analogue of being real for numbers. We say that a matrix A is unitary if

A∗A = AA∗ = I, that is, the adjoint A∗ is equal to the inverse of A. The identity

A∗A = AA∗ = I is the matrix analogue of zz = 1, or z = 1. Thus, being unitary is a | | matrix analogue of being unit modular for complex numbers. Denote by U(n) the set of

2 all n n unitary matrices. It is easy to check that U(n) forms a under the usual × . For example, A, B U(n) implies A∗A = B∗B = I and hence ∈ (AB)(AB)∗ = ABB∗A∗ = AIA∗ = AA∗ = I, etc. The group U(n) is called the . It plays a basic role in the geometry of the complex vector space Cn.

Let A be an n n and denote by v , v ,..., v its column × 1 2 n vectors. Thus we have A = [v1 v2 ... vn] and hence

v v1∗v1 v1∗v2 v1∗vn v1, v1 v1, v2 v1, vn 1∗ v v2∗v1 v2∗v2 v2∗vn v2, v1 v2, v2 v2, vn 2∗ A∗ = .  A∗A= =  . .      v∗       n   vn∗ v1 vn∗ v2 vn∗ vn   vn, v1 vn, v2 vn, vn        Thus A∗A = I tells us that v , v = δ , meaning that the columns v , v ,..., v j k jk 1 2 n form an orthonormal basis in Cn. We have shown that the columns of a unitary matrix form an orthonormal basis. It turns out that the converse is also true. We have arrived at the following characterization of unitary matrices:

An n n matrix is unitary iff its columns form an orthonormal basis in Cn. × Here “iff” stands for “if and only if”, a short hand invented by Paul Halmos. We also have the “real version” of the above statement: A real n n matrix is orthogonal iff its columns × form an orthonormal basis in Rn. Now we give examples of unitary matrices which are used in practice: communication theory (but exactly how they are used is too lengthy to be explained here).

Example 6.2. The matrix

1 1 1 1/√2 1/√2 H1 = wih columns v1 = , v2 = √2 1 1 1/√2 1/√2 − − is an , since we can check that its columns v1, v2 form an orthonormal 2 basis in R . Now we describe a process to define the Hn. Let a a A = 11 12 a a 21 22 be a 2 2 matrix and let B be an n n matrix. We define their tensor product A B × × ⊗ to be the 2n 2n matrix given × a B a B A B = 11 12 . ⊗ a21B a22B 3 We have the following basic identities about tensor products of matrices:

aA bB = ab(A B), (A B)∗ = A∗ B∗, (A B)(C D) = AC BD. (6.1) ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ A consequence of these identities is: if A and B are unitary (or orthogonal), then so is A B. For example ⊗ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 H2 H1 H1 = = − − ≡ ⊗ 2 1 1 ⊗ 1 1 2  1 1 1 1  − − − −  1 1 1 1   − −  We can define Hn inductively by putting

1 Hn 1 Hn 1 Hn = H1 Hn 1 = − − ⊗ − √2 Hn 1 Hn 1 − − − which is a 2n 2n orthogonal matrix, called the Hadamard matrix. We remark that × tensoring is an important operation used in many areas, such as quantum information and quantum computation.

Example 6.3. Let ω = e2πi/n. The columns of the following matrix is the orthonor- mal basis of Cn described in Example 5.2 of the last chapter and hence is a unitary matrix:

1 1 1 1 1 1 2 3 4 n 1 1 ω ω ω ω ω − 2 4 6 8 2(n 1)  1 ω ω ω ω ω −  1 F =   √n          n 1 2(n 1) 3(n 1) 4(n 1) (n 1)(n 1)   1 ω − ω − ω − ω − ω − −    The linear mapping associated with this matrix is called the finite . To speed up this transform by using some special methods is related to saving the cost of communication network in recent years. The rediscovery of so–called FFT (Fast Fourier Transform) has great practical significance. Now the historian can back FFT method as early as Gauss.

The material in the rest of the present chapter is optional.

We say that an n n complex matrix A is orthogonally diagonalizable if there × is an orthonormal basis = e , e ,..., e consisting of eigenvectors of A, that is, E { 1 2 n} 4 for each k, Aek = λkek, where λk is the eigenvalue corresponding to the eigenvector ek. Now we use the basis vectors (considered as columns) in to form the unitary matrix E U = [e1 e2 ... en]. In the next step, we make use of Aek = λkek, but somehow we find it incorrect because we need to consider the scalar λ as a 1 1 matrix, while the k × vector e on its right hand side is n 1. To adjust this, we rewrite λ e as e λ . Thus k × k k k k we have Aek = ekλk. Now the way is clear for the following matrix manipulation:

AU = A[e1 e2 ... en]

= [Ae1 Ae2 ...Aen]

= [e1λ1 e2λ2 ... enλn]

= [e1 e2 ... en]D = UD where D is the given by

λ1 0 0 0 0 λ2 0 0  0 0 λ 0  D = 3          0 0 0 λn    1 Thus we have A = UDU − . The above steps can go backward. So we have proved:

1 Fact. A is orthogonally diagonalizable if and only if A = UDU − UDU ∗ for some ≡ unitary U and diagonal D.

1 The identity A = UDU ∗ gives A∗ = (U ∗)∗D∗U ∗ = UD∗U ∗ = UD∗U − and hence

1 1 1 A∗A = UD∗U − UDU − = UD∗DU − 1 1 1 = UDD∗U − = UDU − UD∗U − = AA∗,

in view of 2 λ1 0 0 0 | | 2 0 λ2 0 0 | | 2  0 0 λ3 0  D∗D = DD∗ = | |        2   0 0 0 λn   | |  A matrix A is called a if the identity A∗A = AA∗ holds. We have shown that orthogonally diagonalizable matrices are normal. An important fact in linear algebra says that the converse is also true. So we conclude:

Fact. A complex matrix is orthogonally diagonalizable if and only if it is normal.

5 We do not prove this theorem here because it takes the length much more than the one we are willing to complete. Notice that both self adjoint matrices and unitary matrices are normal and hence they are orthogonally diagonalizable.

Denote by SU(2) the set of all 2 2 unitary matrices of equal to 1: × SU(2) = U U(2): det(U) = 1 . { ∈ }

Let U be in SU(2). Write down U and UU ∗ explicitly as follows

z w z w z¯ u¯ z 2 + w 2 zu¯ + wv¯ U = and UU = = . u v ∗ u v w¯ v¯ | u|z¯ + |vw¯| u 2 + v 2 | | | | 2 2 From UU ∗ = I we get z + w = 1 and uz¯ + vw¯ = 0. Assume w = 0 and z = 0. | | | | Then we may write u = αw¯ and v = βz¯ for some α and β. Now uz¯ + vw¯ = 0 gives (α + β)zw = 0 and hence α + β = 0. Thus

1 = det(U) = zv wu = z(βz¯) w(αw¯) − − = z(βz¯) w( βw¯) = β( z 2 + w 2) = β. − − | | | | Therefore U is of the form

z w U = , where z 2 + w 2 zz¯ +ww ¯ = 1. (6.2) w¯ z¯ | | | | ≡ − In case z = 0 or w = 0, U has the same form (please check this). We conclude: a 2 2 × matrix U is in SU(2) if and only if it can be expressed at (6.2) above.

Writing z = x0 + ix1 and w = x1 + ix2 in (6.2), we have

z w x + ix x + ix U = = 0 1 2 3 = x 1 + x i + x j + x k, (6.3) w¯ z¯ x + ix x ix 0 1 2 3 − − 2 3 0 − 1 where 1 0 i 0 0 1 0 i 1 = , i = , j = , k = . (6.4) 0 1 0 i 1 0 i 0 − − Matrix U in (6.2) belongs to SU(2) if and only if

z 2 + w 2 x2 + x2 + x2 + x2 = 1. | | | | ≡ 0 1 2 3

2 2 2 2 An expression written as the RHS of (6.3), without the condition x0 + x1 + x2 + x3 = 1 imposed, is called a . Since the theory of was discovered by

6 Hamilton, we denote the collection of all quaternions by H. The algebra of quaternions is determined by the following identities among basic units 1, i, j, k:

1q = q1 = q, i2 = j2 = k2 = 1, ij = ji = k, jk = kj = i, ki = ik = j, (6.5) − − − − where q is any quaternion. These identities can be checked by direct computation. We usually suppress the unit 1 of the quaternion algebra H and write x0 for x01. Let q be the quaternion given as (6.3), which is a 2 2 complex matrix. Its adjoint is given by ×

z¯ w x0 ix1 x2 ix3 q∗ = − = − − − = x0 x1i x2j x3k, w¯ z x2 ix3 x0 + ix1 − − − − which is also called the conjugate of q. A direct computation shows

2 2 2 2 q∗q = qq∗ = ( z + w )1 z + w | | | | ≡ | | | | 2 2 2 2 = det(q) = x0 + x1 + x2 + x3.

The square root of the last expression is called the of q and is denoted by q . Thus 2 q∗q = qq∗ = q . So, q is in SU(2) if and only if q = 1: SU(2) = q = x + x i + x j + x k H: q 2 x2 + x2 + x2 + x2 = 1 . { 0 1 2 3 ∈ ≡ 0 1 2 3 }

Regarding H as the 4-dimensional space with rectangular coordinates x0, x1, x2, x3, we 2 2 2 2 may identity SU(2) is the 3-dimensional sphere x0 + x1 + x2 + x3 = 1, which will be simply called the 3-sphere. Notice that, if we write z = x0 + x1i and w = x2 + x3i,

then q = x0 + x1i + x2j + x3k can be written as q = z + wj, in view of ij = k.

For a quaternion q = x0 + x1i + x2j + x3k, we often write q = x0 + x, where x0 is

called the scalar part and x = x1i + x2j + x3k is called the vector part. From (6.7) we see how to multiply “pure vector” quaternions. It is easy to check that the product of two quaternions q = x0 + x and r = y0 + y is determined by

qr = (x0 + x)(y0 + y) = x0y0 + x0y + y0x + xy, where (6.6) xy = x y + x y. − ×

The “scalar plus vector” decomposition q = x0 + x of a quaternion is also convenient for deciding its conjugate, as we can easily check that

q∗ = (x + x)∗ = x x, (6.7) 0 0 − 7 which resembles the identity x + iy = x iy for complex numbers. From (6.7) we see that − a quaternion q is a pure vector if and only if q∗ = q, that is, q is skew Hermitian. −

We identify a pure vector x = x1i + x2j + x3k with the vector x = (x1, x2, x3) in R3. For each q SU(2), define a linear transformation R(q) in R3 by putting ∈ 3 R(q)x = q∗xq. We can check that y R(q)x is indeed in R : ≡

y∗ = (R(q)x)∗ = (q∗xq)∗ = q∗x∗q = q∗( x)q = q∗xq = y. − − − The most interesting thing about R(q) is that it is an isometry: x and y R(q)x have ≡ the same length. Indeed,

2 y = y∗y = (q∗xq)∗(q∗xq) 2 2 2 = q∗x∗qq∗xq = q∗x∗xq = q∗ x q = x q∗q = x . Using some connectedness argument in topology, one can show that R(q) is actually a rotation (not a reflection) in 3–space. It turns out that every rotation in 3–space can be written in the form R(q) and we call it the representation of the rotation. Also, we call SU(2) the spinor group. It is an essential mathematical device for describing electron spin, and studying aircraft stability. It is also used to explain how a cat can turn its body 180o in the midair in order to achieve a safe landing, without violating the basic physical law of conservation of angular momentum.

8