Elementary Matrix Theory

4 Elementary Matrix Theory

We will be using matrices:

1. For describing “change of basis” formulas.

2. Fordescribinghowtocomputetheresultofanygivenlinearoperationactingona “vector” (e.g., ﬁnding the components with respect to some basis of the force acting on an object having some velocity.)

These are two completely different things. Do not confuse them even though the same computational apparatus (i.e., matrices) is used for both. For example, if you confuse “rotating a vector” with “using a basis constructed by rotating the original basis”,you are likely to discover that your computations have everything spinning backwards. Throughout this set of notes, K , L , M and N are positive integers.

4.1 Basics Our Basic Notation A matrix A of size M × N is simply a rectangular array with M rows and N columns of “things”, a11 a12 a13 ··· a1N a a a ··· a  21 22 23 2N  A = a31 a32 a33 ··· a3N .    ......   . . . . .    a a a ··· a   M1 M2 M3 M N    As indicated, I will try to use “bold face, upper case letters” to denote matrices. We will use two notations for the (i, j)th entry of A (i.e., the “thing” in the i th row and j th column):

th (i, j) entry of A = ai j = [A]i j

Until further notice, assume the “thing” in each entry is a scalar (i.e., a real or complex number). Later, we’ll use such “things” as functions, operators, and even other vectors and matrices.

9/22/2013 Chapter & Page: 4–2 Elementary Matrix Theory

The matrix A is a row matrix if and only if it consists of just one row, and B is a column matrix if and only if it consists of just one column. In such cases we will normally simplify the indices in the obvious way,

b1 b  2  A = a a a ··· a and B = b3 . 1 2 3 N    .   .    b   N   

Basic Algebra Presumably, you are already acquainted with matrix equality, addition and multiplication. So all we’ll do here is express those concepts using our [·]i j notation: Matrix Equality Two matrices A and B are equal (and we write A = B ) if and only if both of the following hold: (a) A is the same size as B . (b) Letting M ×N be the size of A and B ,

[A] jk = [B] jk for j = 1 ... M and k = 1 ... N .

Matrix Addition and Scalar Multiplication Assuming A and B are both M×N matrices, and α and β are scalars, then αA + βB is the M ×N matrix with entries

[αA + βB] jk = α[A] jk + β[B] jk for j = 1 ... M and k = 1 ... N .

Matrix Multiplication Assuming A is a L × M matrix and B is a M × N matrix, their product AB is the L ×N matrix with entries

th th [AB] jk = “ j row of A times k column of B”

b1k b  2k  = a a a ··· a b3k j1 j2 j3 j M    .   .    b   Mk    = a j1b1k + a j2b2k + a j3b3k +···+ a j M bMk

= a jm bmk m= X1 M

= [A] jm[B]mk . m= X1 Basics Chapter & Page: 4–3

If A is a row matrix, then so is AB and the above formula reduces to M

[AB]k = [A]m[B]mk . m=1 X If, instead, B is a column matrix, then so is AB and the above formula reduces to M

[AB] j = [A] jm[B]m . m= X1 Do remember that matrix multiplication is not commutative in general, but is associative. That is, except for a few special cases, AB 6= BA , but, as long as the sizes are appropriate, we can assume A(BC) = (AB)C . Also, you cannot assume that if the product AB is a matrix of just zeros, then either A or B must all be zeros.

?◮Exercise4.1 a: Give an example where AB 6= BA . b: Give an example of two nonzero matrices A and B for which the product AB is a matrix of just zeros. ( A and B can have some zeros, just make sure that at least one entry in each is nonzero.)

Inner Products of Matrices Using the above deﬁnitions for addition and scalar multiplication, the set of all M×N matrices form a vector space of dimension M N . The default inner product for this vector space is the natural extension of the inner product for ރN , M N M N ∗ ∗ h A | B i = a jk b jk = [A] jk [B] jk . j= k= j= k= X1 X1 X1 X1 Note that, if A and B are both row matrices

A = a1 a2 a3 ··· aN and B = b1 b2 b3 ··· bN or are both column matrices a1 b1 a b  2   2  A = a3 and B = b3      .   .   .   .      a  b   M   M  then     ∗ ∗ h A | B i = a j b j = [A] j [B] j . j j X X

version: 9/22/2013 Chapter & Page: 4–4 Elementary Matrix Theory

Conjugates, Transposes and Adjoints In the following, assume A is a M ×N matrix and B is a N ×L matrix.

Complex Conjugates The complex conjugate of A , denoted by A∗ is simply the matrix obtained taking the complex conjugate of each entry, ∗ ∗ A mn = [A]mn . !◮Example 4.1:

2 + 3i 5 7 − 2i 2 − 3i 5 7 + 2i If A = then A∗ = . " 6i 4 + 4i 8 # " −6i 4 − 4i 8 #

We will say A is real if and only if all the entries of A are real, and we will say A is imaginary if and only if all the entries of A are imaginary. The following are easily veriﬁed, if not obvious:

1. (AB)∗ = A∗B∗ .

2. (A∗)∗ = A .

3. A is real ⇐⇒ A∗ = A .

4. A is imaginary ⇐⇒ A∗ = −A .

Transposes The transpose of the M × N matrix A , denoted in these notes1 by AT , is the N × M matrix whose rows are the columns of A (or, equivalently, whose columns are the rows of A ),

T A mn = [A]nm . !◮Example 4.2:

2 + 3i 6i 2 + 3i 5 7 − 2i If A = then AT = 5 4 + 4i . 6i 4 + 4i 8   " # 7 − 2i 8     Also, 1 + 2i T If |ai =  3i  then |ai = 1 + 2i 3i 5 . 5    

1 but Arfken, Weber & Harris use A

e Basics Chapter & Page: 4–5

If you just think about what “transposing a transpose” means, it should be pretty obvious that T AT = A . If you think a littlebit about the role of rows and columns in matrix multiplication, then you may not be surprised by the fact that (AB)T = BTAT .

This is easily proven using our [·] jk notation:

M T (AB) jk = [AB]kj = [A]km[B]mj m=1 X M T T = A mk B jm m=1 X M T T T T = B jm A mk = B A jk , m=1 X showing that T T T (AB) jk = B A jk for every ( j, k) . Hence, (AB)T = BTAT as claimed above. Two more terms you may recall:

T A is symmetric ⇐⇒ A = A ⇐⇒ [A] jk = [A]kj .

And T A is antisymmetric ⇐⇒ A = −A ⇐⇒ [A] jk = −[A]kj . Note that, to be symmetric or antisymmetric, the matrix must have the same number of rows as it has columns (i.e., it must be “square”). You may recall from your undergraduate linear algebra days that there is a nice theory concerning the eigenvalues and eigenvectors of real symmetric matrices. We’ll extend that theory later.

Adjoints Combining the transpose with complex conjugation yields the “adjoint”.2 The adjoint of the M×N matrix A , denoted by either adj(A) or A† , isthe N×M matrix

∗ T A† = AT = A∗ . Using the [·] jk notation, † ∗ A jk = [A]kj .

2 The adjoint discussed here is sometimes called the operator adjoint. You may also ﬁnd reference to the “classical” adjoint, which is something totally different, and is of no interest to us at all. version: 9/22/2013 Chapter & Page: 4–6 Elementary Matrix Theory

!◮Example 4.3:

2 − 3i −6i 2 + 3i 5 7 − 2i If A = then A† = 5 4 − 4i . 6i 4 + 4i 8   " # 7 + 2i 8     Also, 1 + 2i † If |ai =  3i  then |ai = 1 − 2i −3i 5 . 5     From what we’ve already discussed regarding complex conjugates and transposes, we immediately get the following facts:

1. A† † = A .

2. (AB)† = B†A† .

3. A† = AT ⇐⇒ A is real.

4. A† = −AT ⇐⇒ A is imaginary. Any matrix that satisﬁes A† = A is said to be either self adjoint or Hermitian, depending on the mood of the speaker. Such matrices are the complex analogs of symmetric real matrices and will be of great interest later. Along the same lines, any matrix that satisﬁes A† = −A is said to be antiHermitian. I suppose you could also call them “anti-self adjoint” (“self anti-adjoint”?) though that is not commonly done.

4.2 Extending the “Bra – Ket” Notation Using an Orthonormal Basis Assume we have an N-dimensional vector space V with orthonormal basis

B = { e1, e2, ..., eN } .

Remember, for any vector v in this space,

|vi = |viB = column matrix of components of v w.r.t. B

v1 N v2 =  .  where v = v j e j . .   j=1 v  X  N    We now deﬁne hv| to be the adjoint of |vi ,

† ∗ ∗ ∗ hv| =hv|B = |vi = v1 v2 ··· vN . Extending the “Bra – Ket” Notation Chapter & Page: 4–7

Observe that A |vi † = hv| A† . Also note that w1 N w2 ∗ ∗ ∗ ∗ hv| |wi = v1 v2 ··· vN  .  = v j w j .   j=1 w  X  N  which, since we are assuming an orthonormal basis, reduces  to hv| |wi = h v | w i . matrix vector product inner | {z } | product{z } The “ h·| ” is the “bra” in the (Dirac) “bra-ket” notation.

Using an Arbitrary Basis∗ Now let B = { b1, b2, ..., bN } . be any basis for our vector space V , and let

D = { d1, d2, ..., dN } . be the corresponding reciprocal basis. Recall from section 2.6 that N ∗ h v | w i = v¯k wk k= X1 where v¯1 w1 v¯2 w2  .  = |viD and  .  = |wiB . . .     v¯  w   N   N  In terms of matrix operations,  the above is  

w1 N w2 ∗ ∗ ∗ ∗   † h v | w i = v¯k wk = v¯ v¯ ···v ¯ . = |viD |wiB . 1 2 N . k=1   X w   N  Because of this, the appropriate general deﬁnition of “bra v” is  † hv|= hv|B = |viD where D is the reciprocal basis corresponding to B . That way, we still have hv| |wi = h v | w i .

matrix vector product inner | {z } | product{z }

∗ This material is optional. It requires the “reciprocal basis” from section 2.6. version: 9/22/2013 Chapter & Page: 4–8 Elementary Matrix Theory

4.3 Square Matrices For the most part, the only matrices we’ll have much to do with (other than row or column matrices) are square matrices.

The Zero, Identity and Inverse Matrices A square matrix is any matrix having the same number of rows as columns. Two important N ×N matrices are

The zero matrix, denoted by O or ON , which is simply the N ×N matrix whose entries are all zero, [O] jk = 0 forall ( j, k) .

The identity matrix, denoted by I or IN or 1 (Afkin, Weber & Harris’s notation), which is the matrix whose entries are all zero except for 1’s on the major diagonal,

[I] jk = δ jk .

Recall that, if A is any N ×N matrix, then

A + O = A , AO = OA = O and AI = IA = A .

!◮Example 4.4: The 3×3 zero and identity matrices are

0 0 0 1 0 0 O = 0 0 0 and I = 0 1 0 . 0 0 0 0 0 1         The inverse of a (square) matrix A is the matrix, denoted by A−1 , such that

AA−1 = A−1A = I .

Notallsquarematriceshaveinverses. If A hasaninverse,itissaidtobe invertible or nonsingular. If A does not have an inverse, we call it noninvertible or singular. It may later be worth noting that, while the deﬁnition of the inverse requires that

AA−1 = I and A−1A = I , it can easily be shown that, if B is any square matrix such that

AB = I or BA = I , then AB = I and BA = I , and thus, B = A−1 . Here are few other things you should recall about inverses (assume A is an invertible N×N matrix): Determinants Chapter & Page: 4–9

1. AB = C ⇐⇒ B = A−1C .

2. BA = C ⇐⇒ B = CA−1 .

− 3. A−1 1 = A .

4. If B is also an invertible N ×N matrix, then

(AB)−1 = B−1A−1 .

?◮Exercise 4.2: Verify anything above that is not immediately obvious to you.

4.4 Determinants Deﬁnition, Formulas and Cramer’s Rule Recall that each N × N matrix has an associated scalar called the determinant of that matrix. We will denote the determinant of

a11 a12 ··· a1N

a21 a22 ··· a2N A =   . . .. .  . . . .    aN1 aN2 ··· aN N      by

a11 a12 ··· a1N a11 a12 ··· a1N a a ··· a a a ··· a  21 22 2N  21 22 2N det(A) or det or ......  . . . .  . . . .   aN1 aN2 ··· aN N  aN1 aN2 ··· aN N     as seems convenient. The determinant naturally arises when solving a system of N linear equations for N unknowns

a11x1 + a12x2 + ··· + a1N xN = b1

a21x1 + a22x2 + ··· + a2N xN = b2 . (4.1a) .

aN1x1 + aN2x2 + ··· + aN N xN = bN where the a jk’s and bk’s are known values and the x j ’s are the unknowns. Do observe that we can write this system more concisely in “matrix form”,

Ax = b (4.1b) version: 9/22/2013 Chapter & Page: 4–10 Elementary Matrix Theory

where A is the N ×N matrix of coefﬁcients, and x and b are the column matrices

x1 b1

x2 b2 x =   and b =   . . .  .   .      xN  bN          In solving this, you get (trust me)

det(A) xk = det(Bk) for k = 1, 2,... N (4.2)

th where Bk is the matrix obtained by replacing the k column in A with b . We can then ﬁnd each xk by dividing through by det(A) , provided det(A) 6= 0. (Equation (4.2) is Cramer’s rule for solving system (4.1a) or (4.1b), written the way it should be — but almost never is — written.)

◮ ? Exercise 4.3: What can you say about the possible values for the xk’s if det(A) = 0 in the above?

In theory, the deﬁnitionof and formula for the determinant of any squarematrixcan be based on equation (4.2) as a solution to system (4.1a) or (4.1b). (Clever choices for the b vectors may help.) Instead, we will follow standard practice: I’ll give you the formulas and tell you to trust me. (And if you remember how to compute determinants — as I rather expect — just skip ahead to Properties and Applications of Determinants.) For N = 1 and N = 2, we have the well-known formulas

a11 a12 det[a11] = a11 and = a11a22 − a12a21 . a21 a22

For N > 2, the formula for the determinant of an N × N matrix

a11 a12 ··· a1N

a21 a22 ··· a2N A =   . . .. .  . . . .    aN1 aN2 ··· aN N      can be given in terms of the determinants of (N − 1)×(N − 1) “submatrices” via either the “expansion by minors of the ﬁrst row”

N k−1 det A = (−1) a1k det(A1k) k=1 X or the “expansion by minors of the ﬁrst column”

N j−1 det A = (−1) a j1 det A j1 j=1 X th th where A jk is the (N − 1)×(N − 1) matrix obtained from A by deleting the j row and k column. Determinants Chapter & Page: 4–11

Either of the above formulas gives you the determinant of A . If you are careful with your signs, you can ﬁnd the determinant via an expansion using any row or column. If you are interested3, here is a “most general” formula for the determinant for the above N×N matrix A :

N N N

det(A) = ··· ε(i1, i2,..., iN ) ai11ai22 ··· aiN N i =1 i =1 i =1 X1 X2 XN where ε isafunctionofall N-tuplesoftheintegersfrom 1 to N satisfying all the following:

1. If any two of the entries in (i1, i2,..., iN ) are the same, then

ε(i1, i2,..., iN ) = 0 .

2. If i1 = 1 , i2 = 2 , i3 = 3, etc., then

ε(i1, i2, i3 ..., iN ) = ε(1, 2, 3,..., N) = 1 .

3. Suppose ( j1, j2,..., jN ) is obtained from (i1, i2,..., iN ) by simply switching just two adjacent entries in (i1, i2,..., iN ) . In other words, for some integer K with 1 ≤ K < N , ik if k 6= K or k 6= K + 1

jk =  iK +1 if k = K .   iK if k = K + 1  Then  ε( j1, j2, j3 ..., jN ) = −ε(i1, i2, i3 ..., iN ) .

Properties and Applications of Determinants Let me remind you of many of the properties and a few of the applications of determinants. In many ways, you can consider Cramer’s rule (equation (4.2)) for solving system (4.1a) or, equivalently, matrix equation (4.1b) as a basic deﬁning equation for the determinant. Cramer’s ruletellsusthatsystem(4.1a)canbesolvedforeverychoiceofthe b j ’s ifandonlyif det(A) 6= 0 . But the ability to solve that system is equivalent to the ability to solve the matrix equation

Ax = b for every choice of b , and since this matrix equation is solvable for every choice of b if and only if A is invertible, we must have that:

An N ×N matrix A is invertible ⇐⇒ det(A) 6= 0 .

This is doubtlessly our favorite test for determining if A−1 exists. It will also probably be the most important fact about determinants that we will use. Another property that can be derived from Cramer’s rule is that, if A and B are both N×N matrices, then det(AB) = det(A) det(B) . (4.3)

3 If you are not interested, skip to Properties and Applications of Determinants below. version: 9/22/2013 Chapter & Page: 4–12 Elementary Matrix Theory

(Just consider using Cramer’s rule to solve (AB)x = b , rewritten as A(Bx) = b .) Now it is real easy to use the formulas (or even Cramer’s rule) to verify that

1 0 0 ··· 0 0 1 0 ··· 0   det(I) = det 0 0 1 ··· 0 = 1 .   ......  . . . . .   0 0 0 ··· 1     From this and the observation regarding determinants of products you can easily do the next exercise.

?◮Exercise 4.4: Assume A is invertible, and show that

1 det A−1 = . det(A) A few other properties of the determinant are listed here. Assume A is an N × N matrix and α is some scalar.

1. det(αA) = αN det(A) .

2. det A∗ = det(A)∗ .

3. det AT = det(A) .

4. det A† = det(A)∗ . ?◮Exercise 4.5: Convince yourself of the validity of each of the above statements. (For the ﬁrst, det(αA) = αN det(A) , you might ﬁrst show that det(αI) = αN , and then consider rewriting αA as αIA and taking the determinant of that.)

Also remind yourself of the computational facts regarding determinants listed on page 87 of Arfken, Weber & Harris. Finally, let me comment on the importance of Cramer’s rule: It can be viewed as being very important, theoretically, for the way it relates determinants to to solving of linear systems. However, as a practical tool for solving these systems, it is quite unimportant. The number of computations required to ﬁnd all the determinants is horrendous, especially if the system is large. Other methods, such as the Gauss elimination/row reduction learned in your undergraduate linear algebra course are much faster and easier to use.