Linear Algebra (XXVIII)

Yijia Chen

1. Quadratic Forms  Definition 1.1. Let A be an n × n symmetric over K, i.e., A = aij i,j∈[n] where aij = aji   x1  .  for all i, j ∈ [n]. Then for the variable vector x =  .  xn     a11 ··· a1n x1 0 x ··· x   . .. .   .  x Ax = 1 n  . . .   .  an1 ··· ann xn

= aij · xi · xj i,Xj∈[n] 2 = aii · xi + 2 aij · xi · xj. 1 i

B = C0AC, then B is congruent to A. Lemma 1.4. The matrix congruence is an equivalence relation. More precisely, for all n × n-matrices A, B, and C, (i) A is congruent to A, (ii) if A is congruent to B, then B is congruent to A as well, and (iii) if A is congruent to B, B is congruent to C, then A is congruent to C. Lemma 1.5. Let A and B be two n × n-matrices. If B is obtained from A by the following elementary operations, then B is congruent to A.

(i) Switch the i-th and j-th rows, then switch the i-th and j-th columns, where 1 6 i < j 6 j 6 n. (ii) Multiply the i-th row by k, then multiply the i-th column by k, where i ∈ [n] and k ∈ K \{0}. (iii) Multiply the i-th row by k and add the result to the j-th row, then multiply the i-th column by k and add the result to the j-th column, where i, j ∈ [n] with i 6= j and k ∈ K.

Proof: (i) Recall that switching the i-th and j-th rows of the matrix A is equivalent to multiplying A on its left by the Pij. Similarly the column switching corresponds to multiplying the same Pij on A’s right. Thus 0 B = PijAPij = PijAPij.

1 1 Furthermore, Pij is invertible . Thus B is congruent to A. (ii) and (iii) can be shown in exactly the same fashion. 2

Theorem 1.6. Every is congruent to a .

We first prove a technical lemma.

Lemma 1.7. Let A be an n × n symmetric nonzero matrix. Then there exists an C such that 0 (C AC)11 6= 0. 0 0 Recall that (C AC)11 denotes the element on the first row and the first column of the matrix C AC.

Proof: If (A)11 6= 0, then we are done by taking C = In. Otherwise, assume that (A)ii 6= 0 for some 2 6 i 6 n, then we switch the first and the i-th rows, and then switch the first and the i-th columns. Thereby, we move (A)ii to the first row and the first column. The result then follows from Lemma 1.5 (i). Now we are left with the case that all diagonal elements of A are 0. Since A is symmetric and nonzero, there exist some 1 6 i < j 6 n with Aij 6= 0. We add the i-th row to the j-th row, and then add the i-th column to the j-th column. Let B the resulting matrix. Thereby  (B)jj = (A)jj + (A)ij + (A)ii + (A)ji = 2 · Aij 6= 0.

B is congruent to A by Lemma 1.5 (ii). So we can apply the above second case and in addition use Lemma 1.5 (iii). 2

Proof of Theorem 1.6: Assume that an n × n-matrix A is symmetric, we need to show that A is congruent to a diagonal matrix. The case for n = 1 is trivial. So let n > 2. By Lemma 1.7 and Lemma 1.5 we can assume without loss of generality   a11 a12 ··· a1n a21 a22 ··· a2n  A =   with a 6= 0.  . . .. .  11  . . . .  an1 an2 ··· ann

We multiply the first row by −a21/a11 and add it to the second row, then multiply the first column by −a21/a11 = −a12/a11 and add it to the second column. This yields a matrix of the form   a11 0 ··· a1n  0 − ··· −    .  . . .. .   . . . .  an1 − ··· ann Then we do the same to the third row and the third column, and so on. Eventually, we multiply the first row by −an1/a11 and add it to the n-th row, then multiply the first column by −an1/a11 = −a1n/a11 and add it to the n-th column. The resulting matrix has the form

a 0 11 , 0 B where B is (n − 1) × (n − 1) and symmetric.

1 In fact, PijPij = In.

2 By induction hypothesis, there exists an invertible D such that D0BD is diagonal. Then 1 0  a 0 1 0  a 0  11 = 11 0 D0 0 B 0 D 0 D0BD is diagonal. Since D is invertible, 1 0  1 0  = I . 0 D 0 D−1 n

1 0  a 0  Thus is invertible as well. So by Lemma 1.4, A is congruent to the diagonal 11 . 0 D 0 D0BD 2

1.1. Sylvester’s Law of Inertia. Let A be an n × n-matrix over R. By Theorem 1.6 there exists an invertible n × n-matrix C over R such that C0AC is diagonal. Without loss of generality, we can assume that for some d1,..., dr 6= 0   d1 ··· 0 S 0 C0AC = with S =  . .. .  . 0 0  . . .  0 ··· dr We can further assume that for some p ∈ [r]

d1,..., dp > 0 and dp+1,..., dr < 0. In terms of quadratic form, 0 0 2 2 y (C AC)y = d1x1 + ··· drxr. By further taking p p z1 := d1y1, ··· , zp := dpyp, p p zp+1 := −dp+1y1, ··· , zr := −dryr,

and zr+1 := yr+1, ··· , zr := yn, we obtain a very simple quadratic from

2 2 2 2 z1 + ··· + zp − zp+1 − ··· − zr. The question arises whether the numbers r and p are uniquely determined by the original matrix A.

Theorem 1.8 (Sylvester’s Law of Inertia). Assume that an n × n symmetric matrix A over R is congruent to two diagonal matrices     Ip1 0 0 Ip2 0 0

A1 =  0 −Ir1−p1 0 and A1 =  0 −Ir2−p2 0 . 0 0 0 0 0 0

Then r1 = r2 = rank(A) and p1 = p2.

We give a proof which is very different from the one in the textbook, which might offer more insights. To that end, some preparations are in order. Lemma 1.9. Let A be an m × n-matrix. Then for every invertible m × m-matrix B

rank(A) = rank(BA).

Similarly, for every invertible n × n-matrix C, we have rank(A) = rank(AC).

3 Proof: By the Rank-Nullity Theorem

n = Rank(A) + dim N(A), and n = Rank(A) + dim N(BA).

n Thus it suffices to show N(A) = N(BA), i.e., for every x ∈ K , Ax = 0 ⇐⇒ BAx = 0.

The direction from left to right is trivial. For the converse, assume that Ax = b 6= 0 and Bb = 0. But this means that the column vectors of B are linearly dependent, contradicting that B is invertible, or equivalently, column rank(B) = rank(B) = n. The second property can be proved similarly by observing the row vectors of C. 2

Corollary 1.10. Any two congruent matrices have the same rank.

Proof: Assume A is congruent to B, i.e., there exists an invertible C with

B = C0AC.

Then Lemma 1.9 implies that

rank(B) = rank(C0AC) = rank(AC) = rank(A). 2

The proofs of the next two lemmas are rather routine from the definitions.

Lemma 1.11. Let A be an n × n-matrix of K. Then the mapping x 7→ Ax

n n is an isomorphism from K to K if and only if A is invertible. Lemma 1.12. Let ϕ : V → V be an isomorphism (i.e., an automorphism of V). Furthermore, let W be a subspace of V. (i) ϕ(W) is a subspace of V. (ii) If W is finite-dimensional, then dim(W) = dim ϕ(W).

Proof of Lemma 1.8: Corollary 1.10 implies

r1 = rank(A1) = rank(A) = rank(A2) = r2.

Next, we show that p1 = p2. Observe that by Lemma 1.4, the two matrices A1 and A2 are congruent. Hence, there is an invertible n × n-matrix C such that

0 A1 = C A2C. (1)

For i ∈ [2] let 0 W := x ··· x 0 ··· 0 x ,..., x ∈ ⊆ n. i 1 pi 1 pi R R

In particular, for any x ∈ Wi \{0}, 0 x Aix > 0, and furthermore dim(Wi) = pi. Therefore, we aim to show

dim(W1) = dim(W2).

4 Consider two further subspaces

0 N := 0 ··· 0 x ··· x  x ,..., x ∈ ⊆ n. i pi+1 n pi+1 n R R

Then for every x ∈ N1 we have 0 x A1x 6 0, (2) n n and additionally dim(N1) = n − p1. Let ϕ : R → R be defined by ϕ(x) := C−1x.

n By Lemma 1.11 ϕ is an isomorphism, and then Lemma 1.12 implies ϕ(W2) is a subspace of R with  dim ϕ(W2) = dim(W2) = p2. Next we claim that N1 ∩ ϕ(W2) = {0}. −1 Otherwise, let x ∈ N1 ∩ ϕ(W2) with x 6= 0. It follows that ϕ (x) = Cx ∈ W2 \{0}. Hence

0 0 0 0 0 < (Cx) A2(Cx) = x C A2Cx = x A1x, where the second equality is by (1). This contradicts x ∈ N1 by (2). From the claim

n n = dim(R ) > dim(N1 + ϕ(W2)) = dim(N1) + dim(ϕ(W2)) = n − p1 + p2. So p1 > p2.

By symmetry (using W1 and N2), p2 > p1, and p1 = p2 follows. 2 Theorem 1.8 motivates the following definition.

Definition 1.13. Let A be an n × n symmetric matrix A over R. Then A is congruent to a unique   Ip 0 0  0 −Ir−p 0 , 0 0 0 where r = rank(A), and p is called the positive index of inertia of A. Moreover, r − p is the negative index of inertia of A.

1.2. Positive Definite Forms and Positive Definite Matrices. Let A be an n × n symmetric matrix over R. So it induces a quadratic form   x1 x ··· x   .  f(x1, ··· , xn) = 1 n A  .  . xn

n Definition 1.14. (i) f is positive definite if f(x) > 0 for all x ∈ R \{0}. In that case, A is also called positive definite.

n (ii) f is negative definite if f(x) < 0 for all x ∈ R \{0}. Then, A is also negative definite. n (iii) f is positive semidefinite if f(x) > 0 for all x ∈ R . And the matrix A is positive semidefinite. n (iv) f is negative semidefinite if f(x) 6 0 for all x ∈ R . Then, A is negative semidefinite.

5 Theorem 1.15. (i) f is positive definite if and only if the positive index of inertia of A is n. (ii) f is negative definite if and only if the negative index of inertia of A is n. (iii) f is positive semidefinite if and only if the positive index of inertia of A is rank(A). (iv) f is negative semidefinite if and only if the negative index of inertia of A is rank(A).

Proof: We prove (i), and the rest are similar. Assume that the positive index of inertia of f is n. Hence, there is an invertible C with

0 −1 0 −1 C AC = In, i.e.A = (C ) C ,

n Let x ∈ R \{0}. Then f(x) = x0Ax = (C−1x)0(C−1x). Let   y1  .  −1 y =  .  := C x. yn By x 6= 0 y 6= 0, otherwise 0 = y = Ax = A0, which contradicts Lemma 1.11. Then

0 2 f(x) = y y = yi > 0. iX∈[i] We conclude that f is positive definite. Conversely, let p be the positive index of inertia of A and assume that p < n. Then there exists an invertible C with   Ip 0 0 0 C AC =  0 −Irank(A)−p 0 . 0 0 0  Let y := y1 ··· yn with

y1 = 0, . . . , yp = 0, yp+1 = 1, . . . , yn = 1.

Since p < n, we have y 6= 0. It follows that

x := Cy 6= 0 by Lemma 1.11 again. Note that   Ip 0 0 0 0 0 0 f(x) = x Ax = y C ACy = y  0 −Irank(A)−p 0 y = −(rank(A) − p) 6 0. 0 0 0

Therefore, f is not positive definite. 2

6