<<

Linear (G63.2110) Professor Mel Hausner

Notes on

The symmetric Sn. It is not necessary to know for this discussion but it might help.

Definition. The Sn is the of all 1-1 maps of {1,2,. . . ,n} into itself. Elements of Sn are called permutations. Any rearrangement {i1,i2,...,in} of {1,2,. . . ,n} is regarded as the of Sn which maps k into ik.

If σ ∈ Sn then σ is onto as well as 1-1, since {1,2,. . . ,n} is finite. Multiplication of group elements in Sn is simply composition. Thus ρ = στ is defined by the condition ρ(i)= (στ)(i)=(σ(τ(i)) for all i ∈{1, 2,...,n}. The identity is called I, and as usual, we have σσ−1 = σ−1σ = I.

It is convenient to regard Sn ⊆ Sn+1.Ifσ ∈ Sn, simply define σ(n +1)=n + 1, and the newly define σ is in Sn+1. More accurately, we have defined a 1-1 map from Sn into Sn+1 which clearly preserves multiplication.

Definition. (Cycle notation) Let ii,i2,...,ik be distinct elements of {1,2,. . . ,n}. Then the cycle (i1 i2 ···ik) is the map σ defined by the conditions

1. σ(iν )=iν+1 for ν =1,...,k− 1. 2. σ(ik)=i1. 3. σ(x)=x if x 6∈ {ii,i2,...,ik}.

The numbers ii,i2,...,ik are called elements of the cycle. The cycle σ is said to have length k 2 (or order) k. It is an easy matter to see that σ = I. In fact, σ(i1)=i2, σ (i1)=i3, ..., k−1 σ (i1)=ik.

For example, if n = 5, the cycle σ = (2 5 4) of length 3 satisfies σ(1) = 1,σ(2) = 5,σ(3) = 3,σ(4) = 2,σ(5) = 4 In short, 2 7→ 5 7→ 4 7→ 2 (and 1 7→ 1, 3 7→ 3.) The elements of σ are 2, 4, and 5.

Definition. A transposition (or interchange) is a cycle (ij) of length 2. Note that for transpositions τ,wehaveτ 2 = I.

Theorem. Any permutation in Sn is a product of transpositions. We prove this by induction on n. It is clearly true for n = 2. Assume that it is true for n and let σ ∈ Sn+1.Ifσ(n +1)=n + 1 we have already noted that we may regard σ ∈ Sn and the result is true. Otherwise, σ(n +1)=i where 1 ≤ i ≤ n. Then τ =(n +1 i)σ maps n +1

1 into n + 1, and so we may regard (n +1 i)σ as an element of Sn. By induction (n +1 i)σ = ρ is a product of transpositions. Multiplying by (n +1i) and using (n +1i)2 = I, we find that σ =(n +1i)ρ, a product of transpositions. Note that the count of transpositions goes up by 1 in the inductive step, so we can state that every permutation in Sn is a product of at most n − 1 transpositions. Note also that we don’t really need Sn ⊆ Sn+1. The induction could have been made on the largest element x such that σ(x) =6 x, assuming σ =6 I.

We’ll need the following lemma in what follows (Transpositions almost commute with per- mutations): 0 Lemma. Let σ ∈ Sn. Then if τ is a transposition in Sn, we have στ = τ σ for some transposition in Sn.

Proof of Lemma. Suppose τ =(ij). Let σ(i)=i0 and σ(j)=j0, and consider the transfor- mation ρ = στσ−1. A simple check shows that

ρ(i0)=j0; ρ(j0)=j0; ρ(k)=k if k =6 i0,j0 Thus ρ is a transposition and since ρ = στσ−1, we have ρσ = στ.

We now prove an important and non-trivial result: Theorem: There is a transformation sign:Sn →{−1, 1} with the following properties: 1) If τ is a transposition then sign(τ)=−1. 2) For any ρ, σ ∈ Sn, sign(ρσ) = sign(ρ)sign(σ).

Remark. Since the sign of a transposition is −1, and every permutation is a product of transpositions, it follows that the sign assignment is unique. The idea of this result is that if a permutation is written as a product of k transpositions, then the parity of k is unique.

We prove this by induction on n, using the natural embedding of Sn in Sn+1. It is clearly true for n = 1, where sign(I) = 1, and for n = 2, where sign(I) = 1 and sign(1 2) = −1. Now assume this result true for n, and we want to show how to extend the definition of sign to Sn+1. If σ ∈ Sn+1, we have σ(n +1)= k for some k with 1 ≤ k ≤ n +1. Thus if k = n +1, σ ∈ Sn and otherwise (kn+1)σ ∈ Sn. In this case σ ∈ (kn+1)Sn. Define τk =(kn+1) for 1 ≤ k ≤ n. For convenience, also define τn+1 = I. Thus Sn+1 is partitioned as

Sn+1 = τ1Sn ∪ τ2Sn ∪ ...∪ τnSn ∪ τn+1Sn

We can now define sign on Sn+1.Ifρ ∈ Sn, sign(ρ) is already defined. Otherwise, ρ = τkσ, where σ ∈ Sn. In this case we naturally define sign(ρ)=−sign(σ).

Under this definition, it is clear that the sign of any transposition is −1. We must prove that the sign of a product is the product of the signs. Let ρ1 = τrσ1 and let ρ2 = τsσs, where 1 ≤ r, s ≤ n + 1 and each σi is in Sn. Then

ρ1ρ2 = τrσ1τsσs = τrτσ1σs

2 where, using the lemma, τ is a transpsosition.

These notes present a somewhat more abstract version than the text. As usual, V is a n-dimensional over a field F .

Definition. Let f be an F valued of k vector variables in V . We write f : V k → F . f is said to be k-linear if it is linear in each of its variables. Thus, for each i (1 ≤ i ≤ k), and fixed vectors v1,...,vi−1,vi+1,...,vk, and arbitrary vectors v and w, and arbitrary constants a and b, we have

f(v1,...,vi−1,av+ bw, vi+1,...,vk)=af(...,v,...)+bf(...,w,...) where the dots represent the fixed vectors v1,...,vi−1,vi+1,...,vk.

The following is a generalization of the corresponding result when k =1.

Theorem. Let e1,...,en be a for V , and let k ≥ 1 be given. Let ai1i2...ik ∈ F for each k-tuple (i1,...,ik) with 1 ≤ i1,...,ik ≤ n. Then there is one and only one k-linear f : V k → F such that

f(ei1 ,...,eik )=ai1i2...ik

The proof is straightforward. Let’s first illustrate it for k = 2 and arbitrary n. In this case, we have aij ∈ F for 1 ≤ i, j ≤ n. Now if v, w ∈ V , we have, uniquely,

n n v = X xiei,w= X yjej i=1 j=1

If f were bilinear in v and w with f(ei,ej)=aij, we would have

n n n n f(v, w)=f(X xiei, X yjej = X xiyjf(ei,ej)= X aijxiyj i=1 j=1 i,j=1 i,j=1

Conversely, if for given aij, this formula is taken as the definition of f(v, w), it is easy to see it is bilinear (since it is linear in each of its variables) and that f(ei,ej)=aij

The general case is similar. We are burdened because we can’t use dummy variables i, j,so we have to resort to the indices i1,i2,...,ik.

Note the basis form of f(v1,...,vk). If the vector vk has coordinates (xk1,xk2,...,xkn) then we have n X f(v1,...,vk)= ai1...ik x1i1 ...xkik i1,...,ik=1 In to k-, it is necessary to consider skew .

3 Definition. Let f be k-linear with f : V k → F . f is said to be skew symmetric if f =0 whenever two of the variables are equal.

In this case we also have, when i

f(...,vi,...,vj,...)=−f(...,vj,...,vi,...)

This follows from k-linearity as follows. (We take two variables for simplicity.) Since f(v + w, v + w) = 0, we have by k-linearity

f(v, v)+f(v, w)+f(w, v)+f(w, w)=0 which implies f(v, w)=−f(w, v), since f(v, v)=f(w, w) = 0 by skew symmetry. The condition f(v, w)=−f(w, v) is often taken as the definition of skew symmetry.

We have seen that a k- f is given by a k-dimensional array ai1i2...ik . Since this can be interpreted as the value of f(ei1 ,...,eik ), where e, ...,en are a basis, we can easily see what happens if f is skew symmetric. We would have to have

ai1i2...ik = 0 if the ai are not distinct and

ai1i2...ik changes sign if two of indices are interchanged. It is easy to show that these conditions are sufficient for the corresponding k-linear function to be skew symmetric.

Here a little algebra is very helpful. If k numbers i1,...,ik are rearranged to j1,...,jk, the rearrangement is said to be an even permutation if the rearrangement can be achieved using an even number of interchanges. It is odd if can be achieved in an odd number of interchanges. An important result in algebra is that you can’t have a permutation which is even and odd. The sign of a permutation is said to be 1 if it is even, and −1 if odd. Thus the skew symmetry condition on ai1i2...ik is that

ai1i2...ik = aj1j2...jk where  is the sign of the permutation from the i’s to the j’s.

Definition. A volume element K on an n dimensional vector space V is a skew symmetric, n-linear function of n variables.

Note that this is the first time we set the number k of vector variables equal to the di- mension of the vector space. We may think of K(v1,...,vn) as the oriented volume of the n-dimensional parallelepiped formed by the v1,...,vn. In the , we have an idea that the direction from v1 to v2 is opposite from the direction from v2 to v1. (Think counterclock- wise vs clockwise.) This is determined by the sign of K(v1,v2). Similarly in 3-space, think

4 right handed vs left handed system. And even on the lowly , think left or right. Thus, n-linearity and skew symmetry have geometric meaning and they are partially discussed in the text.

Theorem. The volume elements on V form a one-dimensional space.

Proof. Take a basis e1,...,en. Define the volume element K0 by the condition K0(e1,...en)= 1. This uniquely defines K0 since it implies

a1...n =1,ai1...in = sign(i1 ...in) and ai1...in = 0 if the ik’s are not distinct.

It is now an easy matter to verify that if K is any volume element on V , and k = K(e1,...,en), then K = kK0.

We can now define the of an n × n A.

n Definition. Let {e1,...,en} be the standard basis in F , and let K0 be the unique volume n element of F with K0(e1,...,en) = 1. Then if A is any n × n matrix with columns A1,...,An, we define det(A)=K0(A1,...,An)=|A1 ...An| Here, the absolute value notation is used as a shorthand for the determinant. We often write det(A)=|A|.

It follows directly from the definition that det I =1.

ab For example let’s compute ! Using the bracket notation, the two columns, and cd bilinearity and skew symmetry, we have

ab = |ae1 + ce2,be1 + de2| = ab|e1,e1| + ad|e1,e2| + bc|e2,e1| + cd|e2,e2| cd

= ad|e1,e2|−bc|e1,e2| = ad − bc

We now derive an array of theorems. Initially, they concern the columns of a matrix, but as we shall soon see, det(A) = det(At), so they will apply as well to the rows. Note the parallel with the elementary column operations.

The Column Operations. (1) If two columns of a matrix are interchanged, the determinant changes its sign. For a proof, note that this is precisely the skew symmetry condition of a volume element. | ...Ai ...Aj ...| = −| ...Aj ...Ai ...|.

(2) If a column of a matrix is multiplied by a constant , the determinant is multiplied by that constant. This follows from n-linearity: | ...cAi ...| = c| ...Ai ...|.

5 (3) If a multiple of a column of A is added to another column, the determinant is unchanged. Thus, | ...Ai ...Aj ...| = | ...Ai ...(cAi + Aj) ...|. This follows from n-linearity and skew symmetry.

By definition, we have det(A)=|Ae1,Ae2,...,Aen|, where the ei are the standard basis vectors in F n. We now go one step further with the following important theorem.

Theorem. If A is an n × n matrix, and v1,...,vn are n vectors, then

|Av1,Av2,...,Avn| = det(A)|v1,v2,...,vn|

Note the geometric content here. A maps a parallelepiped into another parallelepiped. Its volume gets magnified by det(A). So det(A), originally conceived as the volume of the of the “unit cube” (defined to have volume 1), is now seen to be a magnifier of the volume of any parallelepiped.

Proof. Define the volume element K(v1,...,vn)=|Av1,Av2,...,Avn|. (It is clearly n-linear and skew symmetric.) Since the volumes form a one dimensional space, there is a constant c satisfying K(v1,...,Vn)=c|v1,v2,...,vn| Thus,

|Av1,Av2,...,Avn| = c|v1,v2,...,vn|

To find c, substitute vi = ei to obtain

|Ae1,Ae2,...,Aen| = c|e1,e2,...,en|

or det(A)=c Thus, |Av1,Av2,...,Avn| = det(A)|v1,v2,...,vn| which is the result.

We have an important corollary.

Corollary. If A and B are n × n matrices,

det(AB) = det(A) det(B)

Proof.

det(AB)=|ABe1,...,ABen| = det(A)|Be1,...,Ben| = det(A) det(B)

Corollary. If A has an inverse then det(A) =6 0 and det(A−1)=1/ det(A).

6 To see this, we have AA−1 = I. Taking determinants, we have det(A) det(A−1) = 1, which is the result.

We now wish to show that det(A) = det(At). This will allow us to use row operations as well as column operations as on page 5, since the converts columns into rows, and vice versa. We prove this using elementary row and column matrices. The three types are as follows:

(1) Row and column interchanges (illustrated for the interchange of row 1 and 2). The corresponding is

 010...   100...  E = E =   r c  001...   . . . .   . . . .. 

t In this case E = Er = Ec is symmetric, and we have det(E) = det(E )=−1, by skew symmetry.

(2) Multiplying a row or column by c (illustrated for row 1). The corresponding elementary matrix for row and column are the same:

" cO# E = E = r c OI

t In this case E = Er = Ec is symmetric, and we have det(E) = det(E )=c by n-linearity.

(3) Adding a constant times a row (here 1) to another row (here 2), with the similar operation for columns. We have

 10O   1 cO Er =  c 1 O  , and Ec =  01O   OO I  OO I

t Using simple column operations, we easily see that det(Er) = det(Ec) = 1. Further, Er = Ec. So in this case, we have det(E) = det(Et).

Thus we have the theorem:

Theorem. If P is a product of elementary row or column matrices, then det(P )=det(P t).

t t t t To see this, note that P = E1E2 ...Ek,soP = Ek ...E2E1. The result follows by taking determinants and using the result for elementary matrices.

We can now prove the general theorem.

7 Theorem. If A is any n × n matrix, then det(A) = det(At).

Proof. By the section on row and column transformations, we know that by consecutive row and column transformations, it is possible to transform A into a matrix A0 whose entries are 0 off the main diagonal. So A0 is symmetric: (A0)t = A. Thus, in matrix form, A0 = PAQ where P and Q are respectively products of elementary row and column matrices. Taking determinants, we get det(A0) = det(P ) det(A) det(Q) Now take transposes and determinants: A0 =(A0)t = QtAtP t so det(A0) = det(Qt) det(At) det(P t) = det(Q) det(At) det(P ) Thus, det(P ) det(A) det(Q) = det(Q) det(At) det(P ). Since P and Q are invertible, both det(P ) and det(Q) are not zeros, and we can cancel the factor det(P ) det(Q) from both sides to get the result.

Evaluation of determinants. In principle, the definition itself will let us evaluate a determinant. However, for large matrices, the number of computations is prohibitive. The expansion of an n × n determinant has n! terms. Thus, even a 5 by 5 has over a hundred terms, each term being a factor of five entries. The use of column and row transformations make life easier. For the record, we give the definition, using the definition. If A is n × n whose entry in the i-th row and j-th column is aij, we have

det(A)= X sign(j1j2 ...jn)a1j1 a2j2 . . . anjn j1j2...jn The sum is taken over all permutations of 1, 2,...,n. This is simplified if many of the terms are 0, and for this we use row and column operations. We first take a useful step:

Theorem. If a matrix A is an upper diagonal matrix1, then det(A) is the product of the elements along the diagonal.

To see this, we have using the definition,

det(A)=|a11e1,a12e1 + a22e2,a13e1 + a23e2 + a33e3,...|

In this expansion, the second term’s a12e1 will contribute 0, because of the first term, and so only the e2 term gives any contribution. Similarly, only the e3 term in the third entry gives a contribution, etc. The required expansion is thus simply

det(A)=|a11e1,a22e2,a33e3,...| = a11a22a33 ...|e1,e2,e3,...| = a11a22 ...ann 1That is, all entries below the main diagonal are 0

8 By transposing, we see that the same result is true for lower diagonal matrices. We can now augment the previous result that invertible matrices have non-zero determinants, by proving the converse.

Theorem. If det(A) =6 0, then A is invertible. For the proof, we know that by row and column operations, we can transform A into a A0 with one’s and zero’s down the main diagonal. We thus have A0 = PAQ. Since P and Q are invertible, their determinants are not 0, and so det(A0) = det(P ) det(A) det(Q) =6 0. This means that A0 has no zeros on the main diagonal, and so A0 = I. Thus I = PAQ, and A = P −1Q−1 is invertible. Summarizing, A is invertible if and only if its determinant is not 0.

The following result reduces the determination of an n × n determinant to a determinant of lower order.

Theorem. Suppose an n × n matrix A is of the form

" ax# A = 0 B

where 0 is the (n−1)×1 zero vector and x isa1×(n−1) row vector. Then det(A)=a det B.

Proof. Apply row and column transformations to A, subject to the conditions that they are used only on the last n−1 rows or columns, and they do not use any of type (2) (multiplying a column or row by c =6 0.) If there are s interchanges of rows or columns, the resulting matrix A0 will have determinant (−1)s det(A). In this way, we can transform A to a diagonal matrix A0:  a 00... 0   0 b1 0 ... 0    " # 0  00b ... 0  a 0 A =  2  = 0  . . . . .  0 B  ......   . . . .    000... bn−1 s 0 Then (−1) det(A) = det(A )=ab1b2 ...bn−1. Since the row and column operations operated 0 s s on the matrix B, we also have det(B )=(−1) det(B)=b1b2 ...bn−1. Thus, (−1) det(A)= a(−1)s det(B). Canceling (−1)s, we have the result.

If a matrix has only one non- anywhere in the first column, this result can be used to calculate the determinant. A row transformation puts it into the first row, and the above theorem can be applied. The result is as follows:

Theorem: Suppose the first column of an n × n matrix is aei, and the matrix obtained by eliminating the first column and the i-th row is the (n − 1) × (n − 1) matrix B. Then det(A)=(−1)1+i det(B).

9 For the proof, suppose the matrix is

 0 r1   · ...    A =  ar  i     · ...    0 rn

where ri is the i-th row to the right of the 0 or the a. Then, interchanging rows, we find

ar i r2 0 r2 r1 r1 0 r1 ... · ...... · ... r i−1 0 ri−1 i−2 ri−1 i+1 ri−1 det(A)= ari = − = −a r1 = −(−1) a =(−1) a 0 r1 ri+1 ri+1 · ... r 0 r i+1 ...... i+1 0 rn ... · ... rn rn rn 0 r n i−2 The extra factor (−1) is needed, because it takes i − 2 steps to pull r1 back to its rightful place before r2, one step at a time.

A similar argument proves the following generalization:

If the j-th column of an n × n matrix A is 0, except possibly for the i-th row whose entry is aij, and if Aij is the determinant of the (n − 1) × (n − 1) matrix obtained by eliminating i+j the i-th row and j-th column of A, then |A| =(−1) aijAij.

Definition. Let A be an n × n matrix, and Aij the determinant of the (n − 1) × (n − 1) matrix obtained by eliminating the i-th row and j-th column of A. The cofactor Cij is i+j defined as (−1) Aij. The sign introduced as a factor of the determinant Aij is governed by the familiar checkerboard pattern:

 + − + ......   − + − ......     + − + ......     . . . . .   ......   . . . .   ...... + 

Thus the previous result can be stated: If all the entries of a matrix in a column (or row) are 0, with the exception of aij, then the determinant of the matrix is aijCij, where Cij is the cofactor of aij.

10 By linearity, since the j-th row is X aijei, we have the following j Expansion by Rows or Columns. If A is an n × n matrix with entries aij and corre- sponding cofactors Cij, then (expanding along a column j)

det(A)=X aijCij i Similarly (expanding along a row i)

det(A)=X aijCij j This has an interesting matrix version. Define the adjoint matrix2 B = adj(A)by

bij = Cji where the C’s are the cofactors of the matrix A. (Note the interchange of row and column here.) Then the following is true:

A(adj(A)) = (adj(A))A = (det A)I (1)

and as a consequence, if det(A) =6 0, we have 1 A−1 = adj(A) det A To prove (1), set B = adj(A), and D=AB. Then by definition,

dij = X aikbkj = X aikCjk k k When i = j, this latter sum is simply the expansion by row i of the determinant of A. Thus, the diagonal elements of D are each det(A). When i =6 j, the sum is 0. For in this case, replace the j-th row of A by its i-th row. In this case the determinant of the resulting matrix is 0, since its i-th and j-th row are the same. Thus, using the expansion along the j-th row we have 0 = X ajkCjk = X aikCjk, since the cofactor of the changed matrix is the same k k along row j. A similar argument applies to the product BA, proving the result. In principle this gives a formula for the determinant: 1 Theorem. If A is invertible, then A−1 = adj(A). det(A)

For example, a familiar and useful formula is

−1 " ab# 1 " d −b # = cd ad − bc −ca

2The text calls it the classical adjoint.

11 We finally note the well-known Cramer’s rule for solving linear .

Cramer’s Rule. Let A be an n × n , and b an n × 1 column vector. Then  x1   x2  the equation Ax = b has the unique solution x =  , with  ·    xn

det(A ) x = i i det(A) where Ai is the matrix A in which the i-th column is replaced by the B, the column of constants, and as usual xi is the i-th component of x.

Proof. We know that the equation has a unique solution x. In vector form, the equation is

x1c1 + x2c2 + ···+ xncn = b where ci is the i-th column of A. Thus,

|c1,...,ci−1,b,ci+1,...,cn| = |c1,...,ci−1,x1c1 + x2c2 + ···+ xncn,ci+1,...,cn|

= xi|c1,c2,...,cn|

This gives the result.

12