<<

LINEAR ALGEBRA

by

Alexander Givental

s u m ι Σ τ σ δ α

Sumizdat s u m Published by Sumizdat ι Σ τ 5426 Hillside Avenue, El Cerrito, California 94530, USA σ δ α http://www.sumizdat.org

University of California, Berkeley Cataloging-in-Publication Data Givental, Alexander / by Alexander Givental ; El Cerrito, Calif. : Sumizdat, 2009. iv, 200 p. : ill. ; 23 cm. Includes bibliographical references and index. ISBN 978-0-9779852-4-1 1. Linear algebra. I. Givental, Alexander.

Library of Congress Control Number:

c 2009 by Alexander Givental All rights reserved. Copies or derivative products of the whole work or any part of it may not be produced without the written permission from Alexander Givental ([email protected]), except for brief excerpts in connection with reviews or scholarly analysis.

Credits Editing: Alisa Givental. Art advising: Irina Mukhacheva, http://irinartstudio.com Cataloging-in-publication: Catherine Moreno, Technical Services Department, Library, UC Berkeley. A Layout, typesetting and graphics: using L TEX and Xfig. Printing and binding: Thomson-Shore, Inc., http://www.tshore.com Member of the Green Press Initiative. 7300 West Joy Road, Dexter, Michigan 48130-9701, USA. Offset printing on 30% recycled paper; cover: by 4-color process on Kivar-7. ISBN 978-0-9779852-4-1 Contents

1 A Crash Course 1 1 Vectors ...... 1 2 QuadraticCurves...... 9 3 ComplexNumbers ...... 17 4 ProblemsofLinearAlgebra ...... 23

2 Dramatis Personae 31 1 Matrices...... 31 2 ...... 41 3 VectorSpaces...... 59

3 Simple Problems 71 1 DimensionandRank...... 71 2 GaussianElimination...... 83 3 TheInertiaTheorem...... 97

4 Eigenvalues 117 1 TheSpectralTheorem ...... 117 2 JordanCanonicalForms ...... 137 Hints ...... 149 Answers ...... 153 Bibliography ...... 157 Index ...... 158

iii iv Foreword

“Mathematics is a form of culture. Mathematical ignorance is a form of national culture.” “In antiquity, they knew few ways to learn, but modern pedagogy discovered 1001 ways to fail.” August Dumbel1

1From his eye-opening address to NCVM, the National Council of Visionaries of Mathematics.

v vi Chapter 1

A Crash Course

1 Vectors

Operations and their properties

The following definition of vectors can be found in elementary geom- etry textbooks, see for instance [4]. A directed segment −−→AB on the plane or in space is specified by an ordered pair of points: the tail A and the head B. Two directed segments −−→AB and −−→CD are said to represent the same vector if they are obtained from one another by translation. In other words, the lines AB and CD must be parallel, the lengths AB and CD must be equal, and the segments must point toward the| same| of| the| two possible directions (Figure 1).

B B

v D u A

C u+v C A

Figure 1 Figure 2

A trip from A to B followed by a trip from B to C results in a trip from A to C. This observation motivates the definition of the

1 2 Chapter 1. A CRASH COURSE vector sum w = v + u of two vectors v and u: if −−→AB represents v and −−→BC represents u then −→AC represents their sum w (Figure 2). The vector 3v = v + v + v has the same direction as v but is 3 times longer. Generalizing this example one arrives at the definition of the multiplication of a vector by a : given a vector v and a real number α, the result of their multiplication is a vector, denoted αv, which has the same direction as v but is α times longer. The last phrase calls for comments since it is literally true only for α> 1. If 0 <α< 1, being “α times longer” actually means “shorter.” If α < 0, the direction of αv is in fact opposite to the direction of v. Finally, 0v = 0 is the zero vector represented by directed segments −→AA of zero length. Combining the operations of vector addition and multiplication by scalars we can form expressions αu+βv+...+γw. They are called linear combinations of the vectors u, v, ..., w with the coefficients α, β, ..., γ.

The pictures of a parallelogram and parallelepiped (Figures 3 and 4) prove that the addition of vectors is commutative and associa- tive: for all vectors u, v, w,

u + v = v + u and (u + v)+ w = u + (v + w).

u

u+v+w w v+u v+w v v

u+v u+v v

u u

Figure 3 Figure 4

From properties of proportional segments and similar triangles, the reader will easily derive the following two distributive laws: for all vectors u, v and scalars α, β,

(α + β)u = αu + βv and α(u + v)= αu + αv. 1. Vectors 3 Coordinates

From a point O in space, draw three directed segments −→OA, −−→OB, and −−→OC not lying in the same plane and denote by i, j, and k the vectors they represent. Then every vector u = −−→OU can be uniquely written as a of i, j, k (Figure 5):

u = αi + βj + γk.

The coefficients form the array (α, β, γ) of coordinates of the vec- tor u (and of the point U) with respect to the i, j, k (or the coordinate system OABC).

αi βj U

γk u

C B

OA

Figure 5

Multiplying u by a scalar λ or adding another vector u′ = α′i + β′j + γ′k, and using the above algebraic properties of the operations with vectors, we find:

λu = λαi+λβj+λγk, and u+u′ = (α+α′)i+(β +β′)j+(γ +γ′)k.

Thus, the geometric operations with vectors are expressed by com- ponentwise operations with the arrays of their coordinates:

λ(α, β, γ) = (λα, λβ, λγ), (α, β, γ) + (α′, β′, γ′) = (α + α′, β + β′, γ + γ′).

What is a vector? No doubt, the idea of vectors is not new to the reader. However, some subtleties of the above introduction do not easily meet the eye, and we would like to say here a few words about them. 4 Chapter 1. A CRASH COURSE As many other mathematical notions, vectors come from physics, where they represent quantities, such as velocities and forces, which are characterized by their magnitude and direction. Yet, the popular slogan “Vectors are magnitude and direction” does not qualify for a mathematical definition of vectors, e.g. because it does not tell us how to operate with them. The computer science definition of vectors as arrays of numbers, to be added “apples with apples, oranges with oranges,” will meet the following objection by physicists. When a coordinate system rotates, the coordinates of the same force or velocity will change, but the numbers of apples and oranges won’t. Thus forces and velocities are not arrays of numbers. The geometric notion of a directed segment resolves this problem. Note however, that calling directed segments vectors would constitute abuse of terminology. Indeed, strictly speaking, directed segments can be added only when the head of one of them coincides with the tail of the other. So, what is a vector? In our formulations, we actually avoided answering this question directly, and said instead that two directed segments represent the same vector if . . . Such wording is due to ped- agogical wisdom of the authors of elementary geometry textbooks, because a direct answer sounds quite abstract: A vector is the class of all directed segments obtained from each other by translation in space. Such a class is shown in Figure 6.

Figure 6

This picture has another interpretation: For every point in space (the tail of an arrow), it indicates a new position (the head). The ge- ometric transformation in space defined this way is translation. This leads to another attractive point of view: a vector is a translation. Then the sum of two vectors is the composition of the translations. 1. Vectors 5 The This operation encodes metric concepts of elementary Euclidean ge- ometry, such as lengths and angles. Given two vectors u and v of lengths u and v and making the angle θ to each other, their dot product| (also| called| | inner product or scalar product) is a number defined by the formula: u, v = u v cos θ. h i | | | | Of the following properties, the first three are easy (check them!): (a) u, v = v, u (symmetricity); h i h i (b) u, u = u 2 > 0 unless u = 0 (positivity); h i | | (c) λu, v = λ u, v = u, λv (homogeneity); h i h i h i (d) u + v, w = u, w + v, w (additivity with respect to the first factor).h i h i h i To prove the last property, note that due to homogeneity, it suf- fices to check it assuming that w is a vector, i.e. w =1. In | | this case, consider (Figure 7) a triangle ABC such that −−→AB = u, −−→BC = v, and therefore −→AC = u + v, and let −−→OW = w. We can consider the line OW as the number line, with the points O and W representing the numbers 0 and 1 respectively, and denote by α, β, γ the numbers representing perpendicular projections to this line of the vertices A,B,C of the triangle. Then

−−→AB, w = β α, −−→BC, w = γ β, and −→AC, w = γ α. h i − h i − h i − The required identity follows, because γ α = (γ β) + (β α). − − − B

u v

C A u+v

OW

α w β γ

Figure 7

Combining the properties (c) and (d) with (a), we obtain the following identities, expressing bilinearity of the dot product (i.e. 6 Chapter 1. A CRASH COURSE linearity with respect to each factor):

αu + βv, w = α u, w + β v, w h i h i h i w, αu + βv = α w, u + β w, v . h i h i h i The following example illustrates the use of nice algebraic properties of dot product in elementary geometry.

B

v−u u

θ v A C Figure 8

Example. Given a triangle ABC, let us denote by u and v the vectors represented by the directed segments −−→AB and −→AC and use properties of the inner product in order to compute the length BC . | | Notice that the segment −−→BC represents v u. We have: − BC 2 = v u, v u = v, v + u, u 2 u, v | | h − − i h i h i− h i = AC 2 + AB 2 2 AB AC cos θ. | | | | − | | | | This is the famous Law of Cosines in trigonometry. When the vectors u and v are orthogonal, i.e. u, v = 0, then the formula turns into the Pythagorean theorem:h i

u v 2 = u 2 + v 2. | ± | | | | | When basis vectors i, j, k are pairwise orthogonal and unit, the coordinate system is called Cartesian.1 We have:

i, i = j, j = k, k = 1, and i, j = j, k = k, i = 0. h i h i h i h i h i h i Thus, in Cartesian coordinates, the inner squares and the dot prod- ucts of vectors r = xi + yj + zk and r′ = x′i + y′j + z′k are given by the formulas:

r 2 = x2 + y2 + z2, r, r′ = xx′ + yy′ + zz′. | | h i 1After Ren´e Descartes (1596–1650). 1. Vectors 7

EXERCISES

1. A mass m rests on an inclined plane making 30◦ with the horizontal plane. Find the forces of friction and reaction acting on the mass.  2. A ferry, capable of making 5 mph, shuttles across a river of width 0.6 mi with a strong current of 3 mph. How long does each round trip take?  3. Prove that for every closed broken line ABC...DE,

−−→AB + −−→BC + + −−→DE + −→EA = 0. · · · 4. Three medians of a triangle ABC intersect at one point M called the barycenter of the triangle. Let O be any point on the plane. Prove that 1 −−→OM = (−→OA + −−→OB + −−→OC). 3 5. Prove that −−→MA + −−→MB + −−→MC = 0 if and only if M is the barycenter of the triangle ABC. 6.⋆ Along three circles lying in the same plane, vertices of a triangle are moving clockwise with the equal constant angular velocities. Find how the barycenter of the triangle is moving.  7. Prove that if AA′ is a median in a triangle ABC, then 1 AA−−→′ = (−−→AB + −→AC). 2 8. Prove that from medians of a triangle, another triangle can be formed.  9. Sides of one triangle are parallel to the medians of another. Prove that the medians of the latter triangle are parallel to the sides of the former one. 10. From medians of a given triangle, a new triangle is formed, and from its medians, yet another triangle is formed. Prove that the third triangle is similar to the first one, and find the coefficient of similarity.  11. Midpoints of AB and CD, and of BC and DE are connected by two segments, whose midpoints are also connected. Prove that the resulting segment is parallel to AE and congruent to AE/4. 12. Prove that a point X lies on the segment AB if and only if for any origin O and some scalar 0 λ 1 the radius-vector −−→OX has the form: ≤ ≤ −−→OX = λ−→OA + (1 λ)−−→OB. − ⋆ 13. Given a triangle ABC, we construct a new triangle A′B′C′ in such a way that A′ is centrally symmetric to A with respect to the center B, B′ centrally symmetric to B with respect to C, and C′ centrally symmetric to C with respect to A, and then erase the original triangle. Reconstruct ABC from A′B′C′ by straightedge and compass.  8 Chapter 1. A CRASH COURSE 14. Prove the Cauchy – Schwarz inequality: u, v 2 u, u v, v . In h i ≤ h ih i which cases does the inequality turn into equality? Deduce the triangle inequality: u + v u + v .  | | ≤ | | | | 15. Compute the inner product −−→AB, −−→BC if ABC is a regular triangle h i inscribed into a unit circle.  16. Prove that if the sum of three unit vectors is equal to 0, then the angle between each pair of these vectors is equal to 120◦. 17. Express the inner product u, v in terms of the lengths u , v , u + v h i | | | | | | of the two vectors and of their sum.  18. (a) Prove that if four unit vectors lying in the same plane add up to 0, then they form two pairs of opposite vectors. (b) Does this remain true if the vectors do not have to lie in the same plane?  19.⋆ Let AB . . . E be a regular polygon with the center O. Prove that

−→OA + −−→OB + + −−→OE = 0.  · · · 20. Prove that if u + v and u v are perpendicular, then u = v . − | | | | 21. For arbitrary vectors u and v, verify the equality:

u + v 2 + u v 2 =2 u 2 +2 v 2, | | | − | | | | | and derive the theorem: The sum of the squares of the diagonals of a parallelogram is equal to the sum of the squares of the sides. 22. Prove that for every triangle ABC and every point X in space,

−−→XA −−→BC + −−→XB −→CA + −−→XC −−→AB = 0.  · · · 23.⋆ For four arbitrary points A, B, C, and D in space, prove that if the lines AC and BD are perpendicular, then AB2 + CD2 = BC2 + DA2, and vice versa.  24.⋆ Given a quadrilateral with perpendicular diagonals. Show that every quadrilateral, whose sides are respectively congruent to the sides of the given one, has perpendicular diagonals.  25.⋆ A regular triangle ABC is inscribed into a circle of radius R. Prove that for every point X of this circle, XA2 + XB2 + XC2 =6R2.  ⋆ 26. Let A1B1A2B2 ...AnBn be a 2n-gon inscribed into a circle. Prove that the length of the vector −−−→A B + −−−→A B + + −−−→A B does not exceed 1 1 2 2 · · · n n the diameter.  27.⋆ A polyhedron is filled with air under pressure. The pressure force to each face is the vector perpendicular to the face, proportional to the area of the face, and directed to the exterior of the polyhedron. Prove that the sum of these vectors is equal to 0.  2. Quadratic Curves 9 2 Quadratic Curves

Conic Sections On the coordinate plane, consider points (x,y), satisfying an equa- tion of the form ax2 + 2bxy + cy2 + dx + ey + f = 0. Generally speaking, such points form a curve. The set of solutions is called a quadratic curve, provided that not all of the coefficients a, b, c vanish. Being a quadratic curve is a geometric property. Indeed, if the coordinate system is changed (say, rotated, stretched, or translated), the same curve will be described by a different , but the L.H.S. of the equation will remain a of degree 2. Our goal in this section is to describe all possible quadratic curves geometrically (i.e. disregarding their positions with respect to coor- dinate systems); or, in other words, to classify quadratic in two variables up to suitable changes of the variables.

A.Givental, 1999 "The Empty Curve" oil on canvas

Figure 9

Example: Dandelin’s spheres. The equation x2 + y2 = z2 describes in a Cartesian coordinate system a cone (a half of which is shown on Figure 10). Intersecting the cone by planes, we obtain examples of quadratic curves. Indeed, substituting the equation z = αx + βy of a secting plane into the equation of the cone, we get a quadratic equation x2 + y2 = (αx + βy)2 (which actually describes the projection of the conic section to the horizontal plane). 10 Chapter 1. A CRASH COURSE The conic section on the picture is an ellipse. According to one of many equivalent definitions,2 an ellipse consists of all points on the plane with a fixed sum of the distances to two given points (called foci of the ellipse). Our picture illustrates an elegant way3 to locate foci of a conic section. Place into the conic cup two balls (a small and a large one), and inflate the former and deflate the latter until they touch the plane (one from inside, the other from outside). Then the points F and G of tangency are the foci. Indeed, let A be an arbitrary point on the conic section. The segments AF and AG lie in the cutting plane and are therefore tangent to the balls at the points F and G respectively. On the generatrix OA, mark the points B and C where it crosses the circles of tangency of the cone with the balls. Then AB and AC are tangent at these points to the respective balls. Since all tangent segments from a given point to a given ball have the same length, we find that AF = AB , and AG = AC . Therefore AF + AG = BC . But| BC| is| the| distance| along| | the| generatrix| between| | two| parallel| | horizontal| | circles on the cone, and is the same for all generatrices. Therefore the sum AF + AG stays fixed when the point A moves along our conic section.| | | |

C

G A F

B

O Figure 10 Beside ellipses, we find among conic sections: hyperbolas (when a plane cuts through both halves of the cone), (cut by planes parallel to generatrices), and their degenerations (obtained when the cutting plane is replaced with the parallel one passing through the vertex O of the cone): just one point O, pairs of inter- secting lines, and “double-lines.” We will see that this list exhausts all possible quadratic curves, except two degenerate cases: pairs of parallel lines and (yes!) empty curves.

2According to a mock definition, “an ellipse is a circle inscribed into a square with unequal sides.” 3Due to Germinal Pierre Dandelin (1794–1847). 2. Quadratic Curves 11 Orthogonal Diagonalization (Toy Version) Let (x,y) be Cartesian coordinates on a Euclidean plane, and let Q be a quadratic form on the plane, i.e. a homogeneous degree-2 polynomial: Q(x,y)= ax2 + 2bxy + cy2. Theorem. Every quadratic form in a suitably rotated co- ordinate system assumes the form: Q = AX2 + CY 2. Proof. Rotating the basis vectors i and j counter-clockwise through

j I

J θ θ

i O

Figure 11 the angle θ we obtain (Figure 11): I = (cos θ)i + (sin θ)j and J = (sin θ)i + (cos θ)j. − Therefore xi + yj = XI + Y J = (X cos θ Y sin θ)i + (X sin θ + Y cos θ). − This shows that the old coordinates (x,y) are expressed in terms of the new coordinates (X,Y ) by the formulas x = X cos θ Y sin θ, y = X sin θ + Y cos θ. ( ) − ∗ Substituting into ax2 + 2bxy + cy2, we rewrite the quadratic form in the new coordinates as AX2 + 2BXY + CY 2, where A,B,C are certain expressions of a, b, c and θ. We want to show that choosing the rotation angle θ appropriately, we can make 2B = 0. Indeed, making the substitution explicitly and ignoring X2- and Y 2-terms, we find Q in the form + XY 2a sin θ cos θ + 2b(cos2 θ sin2 θ) + 2c sin θ cos θ + . . . · · · − −  12 Chapter 1. A CRASH COURSE Thus 2B = (c a) sin 2θ + 2b cos 2θ. When b = 0, our task is trivial, as we can take−θ = 0. When b = 0, we can divide by 2b to obtain 6 a c cot 2θ = − . 2b Since cot assumes arbitrary real values, the theorem follows. Example. For Q = x2 + xy + y2, we have cot 2θ = 0, and find 2θ = π/2 + πk (k = 0, 1, 2,... ), i.e. up to multiples of 2π, θ = π/4 or 3π/4. (This± is± a general rule: together with a solution θ±, the angle± θ + π as well as θ π/2, also work. Could you give an a priori explanation?) Taking± θ = π/4, we compute x = (X Y )/√2, y = (X + Y )/√2, and finally find: − 1 3 1 x2 + y2 + xy = X2 + Y 2 + (X2 Y 2)= X2 + Y 2. 2 − 2 2

Completing the Squares In our study of quadratic curves, the plan is to simplify the equation of the curve as much as possible by changing the coordinate system. In doing so we may assume that the coordinate system has already been rotated to make the coefficient at xy-term vanish. Therefore the equation at hands assumes the form ax2 + cy2 + dx + ey + f = 0, where a and c cannot both be zero. Our next step is based on completing squares: whenever one of these coefficients (say, a) is non-zero, we can remove the corresponding linear term (dx) this way:

d d d2 d2 ax2 + dx = a(x2 + x)= a (x + )2 = aX2 . a 2a − 4a2 − 4a   Here X = x + d/2a, and this change represents translation of the origin of the coordinate system from the point (x,y) = (0, 0) to (x,y) = ( d/2a, 0). − Example. The equation x2 + y2 = 2ry can be rewritten by com- pleting the square in y as x2 + (y r)2 = r2. Therefore, it describes the circle of radius r centered at the− point (0, r) on the y-axis. With the operations of completing the squares in one or both variables, renaming the variables if necessary, and dividing the whole equation by a non-zero number (which does not change the quadratic curve), we are well-armed to obtain the classification. 2. Quadratic Curves 13 Classification of Quadratic Curves Case I: a = 0 = c. The equation is reduced to aX2 + cY 2 = F by completing6 squares6 in each of the variables. Sub-case (i): F = 0. Dividing the whole equation by F , we obtain the equation (a/F6 )X2 + (c/F )Y 2 = 1. When both a/F and c/F are positive, the equation can be re-written as X2 Y 2 + = 1. α2 β2 This is the equation of an ellipse with semiaxes α and β (Figure 12). When one a/F and c/F have opposite signs, we get (possibly renaming the variables) the equation of a hyperbola (Figure 13) X2 Y 2 = 1. α2 − β2 When a/F and c/F are both negative, the equation has no real solutions, so that the quadratic curve is empty (Figure 9).

β −α α α −α −β

Figure 12 Figure 13

Sub-case (ii): F = 0. Then, when a and c have opposite signs (say, a = α2 > 0, and c = γ2 < 0), the equation α2X2 = γ2Y 2 describes a pair of intersecting− lines Y = kX, where k = α/γ (Figure 14). When a and c are of the same sign,± the equation aX2 + cY 2 = 0 has only one real solution: (X,Y ) = (0, 0). The quadratic curve is a “thick” point.4 Case II: One of a, c is 0. We may assume without loss of gener- ality that c = 0. Since a = 0, we can still complete the square in x to obtain an equation of the6 form aX2 + ey + F = 0. Sub-case (i): e = 0. Divide the whole equation by e and put Y = y F/e to arrive6 at the equation Y = aX2/e. This curve is a − Y = kX2, where k = a/e = 0 (Figure− 15). − 6 4In fact this is the point of intersection of a pair of “imaginary” lines consisting of non-real solutions. 14 Chapter 1. A CRASH COURSE Sub-case (ii): e = 0. The equation X2 = F/a describes: a − pair of parallel lines X = k (where k = F/e), or the empty set (when F/e > 0), or a “double-line”± X = 0 (when− F = 0). p

k k

1 1 −k

Figure 14 Figure 15

We have proved the following: Theorem. Every quadratic curve on a Euclidean plane is one of the following: an ellipse, hyperbola, parabola, a pair of intersecting, parallel, or coinciding lines, a “thick” point or the empty set. In a suitable Cartesian coordinate system, the curve is described by one of the standard equations: X2 Y 2 = 1, 1, or 0; Y = kX2; X2 = k. α2 ± β2 −

EXERCISES 28. Prove that with the exception of parabolas, each conic section has a center of symmetry.  29. Prove that a hyperbolic conic section consists of all points on the secting plane with a fixed difference of the distances to two points (called foci). Locate the foci by adjusting the construction of Dandelin’s spheres. 30. Locate foci of (a) ellipses and (b) hyperbolas given by the standard equations x2/α2 y2/β2 = 1, where α>β> 0.  ± 31. A line is called an axis of symmetry of a given function Q(x, y) if the function takes on the same values at every pair of points symmetric about this line. Prove that every quadratic form has two perpendicular axes of symmetry. (They are called principal axes.)  32. Prove that if a line passing through the origin is an axis of symmetry of a quadratic form Q = ax2 +2bxy + cy2, then the perpendicular line is also its axis of symmetry. 33. Can a quadratic form on the plane have > 2 axes of symmetry?  34. Find axes of symmetry of the following quadratic forms Q: (a) x2 + xy + y2, (b) x2 +2xy + y2, (c) x2 +4xy + y2. Which of them have level curves Q = const ellipses? hyperbolas?  2. Quadratic Curves 15 35. Transform the equation 23x2 +72xy +2y2 = 25 to one of the standard forms by rotating the coordinate system explicitly.   36. Prove that ellipses are obtained by stretching (or shrinking) the unit circle in two perpendicular directions with two different coefficients. 37. Prove that every quadratic form on the plane in a suitable (but non necessarily Cartesian) coordinate system assumes one of the forms:

X2 + Y 2, X2 Y 2, X2 Y 2, X2, Y 2, 0. − − − − Sketch graphs of these functions. 38. Complete squares to find out which of the following curves are ellipses and which are hyperbolas:  x2 +4xy =1, x2 +2xy +4y2 =1, x2 +4xy +4y2 =1, x2 +6xy +4y2 = 1. 39. Find the place of the quadratic curve x2 4y2 =2x 4y in the classi- − − fication of quadratic curves.  40.⋆ Prove that ax2 +2bxy + cy2 = 1 is a hyperbola if and only if ac

B

O O’ C E A

B’ Figure 16

41. Examine Figure 16 (showing two cones centrally symmetric to each other about the center of the ellipse E), and prove that AO + AO′ = | | | | BB′ . Derive that the vertex O of the cone is a focus of the projection of a | | conic section along the axis of the cone to the perpendicular plane passing through its vertex.5 42.⋆ Prove that ellipses, parabolas and hyperbolas can be characterized as plane curves formed by all points with a fixed ratio e (called eccentricity) between the distances to a fixed point (a focus) and a fixed line (called the directrix), and that e> 1 for ellipses, e = 1 for parabolas, and 1 >e> 0 for ellipses (e.g. e = AO / AC in Figure 16).  | | | | 43.⋆ Prove that light rays emitted from one focus of an ellipse and reflected in it as in a mirror will focus at the other focus. Formulate and prove similar optical properties of hyperbolas and parabolas. 

5The proof based on Figure 16 is due to Anil Hirani. 16 Chapter 1. A CRASH COURSE 3. Complex Numbers 17 3 Complex Numbers

Law and Order Life is unfair: The quadratic equation x2 1 = 0 has two solutions x = 1, but a similar equation x2 + 1 = 0− has no solutions at all. To restore± justice one introduces new number i, the imaginary unit, such that i2 = 1, and thus x = i become two solutions to the equation. This is− how complex numbers± could have been invented. More formally, complex numbers are introduced as ordered pairs (a, b) of real numbers, written in the form z = a + bi. The real numbers a and b are called respectively the real part and imagi- nary part of the complex number z, and are denoted a = Re z and b = Im z. The sum of z = a + bi and w = c + di is defined as z + w = (a + c) + (b + d)i. The product is defined so as to comply with the relation i2 = 1: − zw = ac + bdi2 + adi + bci = (ac bd) + (ad + bc)i. − The operations of addition and multiplication of complex numbers enjoy the same properties as those of real numbers do. In particular, the product is commutative and associative. The complex numberz ¯ = a bi is called complex conjugate to z = a + bi. The operation of− complex conjugation respects sums and products: z + w =z ¯ +w ¯ and zw =z ¯w.¯ This can be easily checked from definitions, but there is a more pro- found explanation. The equation x2 + 1 = 0 has two roots, i and i, and the choice of the one to be called i is totally ambiguous. The− complex conjugation consists in systematic renaming i by i and vice versa, and such renaming cannot affect properties of complex− numbers. Complex numbers satisfyingz ¯ = z are exactly the real numbers a+0i. We will see that this point of view on real numbers as complex numbers invariant under complex conjugation is quite fruitful. The product zz¯ = a2 + b2 (check this formula!) is real, and is positive unless z =0+0i = 0. This shows that 1 z¯ a b = = i. z zz¯ a2 + b2 − a2 + b2 18 Chapter 1. A CRASH COURSE Hence the division by z is well-defined for any non-zero complex number z. In terminology of Abstract Algebra, complex numbers form therefore a field6 (just as real or rational numbers do). The field of complex numbers is denoted by C (while R stands for reals, and Q for rationals). The non-negative real number z = √zz¯ = √a2 + b2 is called the absolute value of z. The absolute| value| function is multiplicative:

zw = √zwzw = √zzw¯ w¯ = z w . | | | | · | | It actually coincides with the absolute value of real numbers when applied to complex numbers with zero imaginary part: a + 0i = a . | | | | z b

i u

θ 1 0 a z −1 i

b z Figure 17

Geometry We can identify complex numbers z = a + bi with points (a, b) on the real coordinate plane (Figure 17). This way, the number 0 is identified with the origin, and 1 and i become the unit basis vectors (1, 0) and (0, 1). The coordinate axes are called respectively the real and imaginary axes. Addition of complex numbers coincides with the operation of vector sum (Figure 18). The absolute value function has the geometrical meaning of the distance from the origin: z = z, z 1/2. In particular, the triangle inequality z + w z + |w| holdsh true.i Complex numbers of unit absolute value| z |= ≤ 1 | form| | the| unit circle centered at the origin. | | The operation of complex conjugation acts on the radius-vectors z as the reflection about the real axis. 6This requires that a set be equipped with commutative and associative op- erations (called addition and multiplication) satisfying the distributive law z(v + w) = zv + zw, possessing the zero and unit elements 0 and 1, additive opposites −z for every z, and multiplicative inverses 1/z for every z =6 0. 3. Complex Numbers 19 In order to describe a geometric meaning of complex multiplica- tion, let us study the way multiplication by a given complex number z acts on all complex numbers w, i.e. consider the function w zw. For this, write the vector representing a non-zero complex n7→umber z in the polar (or trigonometric) form z = ru where r = z is a positive real number, and u = z/ z = cos θ + i sin θ has absolute| | value 1 (see Figure 19). Here θ = arg| | z, called the argument of the complex number z, is the angle that z as a vector makes with the positive direction of the real axis. Clearly, multiplication by r acts on all vectors w by stretching them r times. Multiplication by u applied to w = x+yi yields a new complex number uw = X + Y i according to the rule:

X = Re [(cos θ + i sin θ)(x + yi)] = x cos θ y sin θ − Y = Im [(cos θ + i sin θ)(x + yi)] = x sin θ + y cos θ.

Comparing with the formula ( ) in Section 2, we conclude that the transformation w uw is the∗ counter-clockwise rotation through the angle θ. 7→ i z z+w w w θ z zw θ

1

Figure 18 Figure 19

Notice a difference though: In Section 2, we rotated the coor- dinate system, and the formulas ( ) expressed old coordinates of a vector via new coordinates of the same∗ vector. This time, we trans- form vectors, while the coordinate system remains unchanged. The same formulas now express coordinates (X,Y ) of a new vector in terms of the coordinates (x,y) of a the old one. Anyway, the conclusion is that multiplication by z is the compo- sition of two operations: stretching z times, and rotating through the angle arg z. | | In other words, the product operation of two complex numbers sums their arguments and multiplies absolute values:

zw = z w , arg zw = arg z + arg w modulo 2π. | | | | · | | For example, if z = r(cos θ+i sin θ), then zn = rn(cos nθ+i sin nθ). 20 Chapter 1. A CRASH COURSE The Fundamental Theorem of Algebra A degree 2 polynomial z2 + pz + q has two roots

p p2 4q z± = − ± − . p2 This quadratic formula works regardless of the sign of the dis- criminant p2 4q, provided that we allow the roots to be com- plex, and take− in account multiplicity. Namely, if p2 4q = 0, z2 + pz + q = (z + p/2)2 and therefore the single root z =− p/2 has multiplicity two. If p2 4q < 0 the roots are complex conjugate− with − 2 Re z± = p/2, Im z± = p 4q /2. The Fundamental Theorem of Algebra− shows that not± only| − justice| has been restored, but that any degree n polynomial hasp n complex roots, possibly — multiple. Theorem. A degree n polynomial

n n−1 P (z)= z + a1z + ... + an−1z + an with complex coefficients a1, ..., an factors as P (z) = (z z )m1 ...(z z )mr . − 1 − r Here z1, ..., zr are complex roots of P , and m1, ..., mr their multiplicities, m + + m = n. 1 · · · r A proof of this theorem deserves a separate chapter (if not a book). Many proofs are known, based on various ideas of Algebra, Analysis or Topology. We refer to [6] for an exposition of the classical proof due to Euler, Lagrange and de Foncenex, which is almost en- tirely algebraic. Here we merely illustrate the theorem with several examples. Examples. (a) To solve the quadratic equation z2 = w, equate the absolute value r and argument θ of the given complex number w with those of z2:

z 2 = ρ, 2 arg z = φ + 2πk, k = 0, 1, 2,.... | | ± ± We find: z = √ρ, and arg z = φ/2+ πk. Increasing arg z by even multiples |π| does not change z, and by odd changes z to z. Thus the equation has two solutions: −

φ φ z = √ρ cos + i sin . ± 2 2   3. Complex Numbers 21 (b) The equation z2 + pz + q = 0 with coefficients p,q C has two complex solutions given by the quadratic formula (see∈ above), because according to Example (a), the square root of a complex number takes on two opposite values (distinct, unless both are equal to 0). (c) The complex numbers 1, i, 1, i are the roots of the polyno- mial z4 1 = (z2 1)(z2 + 1) = (−z −1)(z + 1)(z i)(z + i). − − − − (d) There are n complex nth roots of unity (see Figure 20, where n = 5). Namely, if z = r(cos θ + i sin θ) satisfies zn = 1, then rn = 1 (and hence r = 1), and nθ = 2πk, k = 0, 1, 2, .... Therefore θ = 2πk/n, where only the remainder of k modulo± ± n is relevant. Thus the n roots are: 2πk 2πk z = cos + i sin , k = 0, 1, 2, ..., n 1. n n − For instance, if n = 3, the roots are 1 and 2π 2π 1 √3 cos i sin = i . 3 ± 3 −2 ± 2

0 1

Figure 20 As illustrated by the previous two examples, even if all coefficients a1,...,an of a polynomial P are real, its roots don’t have to be real. But then the non-real roots come in pairs of complex conjugate ones. To verify this, we can use the fact that being real means stay invariant (i.e. unchanged) under complex conjugation. Namely, a¯i = ai for all i means that n n−1 P (¯z)= z +a ¯1z + ... +a ¯n = P (z). Therefore we have two of the same polynomial: P¯(¯z) = (z z¯ )m1 ...(z z¯ )mr = (z z )m1 ...(z z )mr = P (z). − 1 − r − 1 − r They can differ only by orders of the factors. Thus, for each non-real root zi of P , the complex conjugatez ¯i must be also a root, and of the same multiplicity. 22 Chapter 1. A CRASH COURSE Expanding the product

(z z )...(z z )= zn (z + ... + z )zn−1 + ... + ( 1)nz ...z − 1 − n − 1 n − 1 n we can express coefficients a1, ..., an of the polynomial in terms of the roots z1, ..., zn (here multiple roots are repeated according to their multiplicities). In particular, the sum and product of the roots are

z + ... + z = a , z ...z = ( 1)na . 1 n − 1 1 n − n These relations generalize Vieta’s theorem z++z− = p, z+z− = q 2 − about roots z± of quadratic equations z + pz + q = 0.

EXERCISES 44. Can complex numbers be: real? real and imaginary? neither?  1 45. Compute: (a) (1 + i)/(3 2i); (b) (cos π/3+ i sin π/3)− .  − 46. Verify the commutative and distributive laws for multiplication of com- plex numbers. 1 47. Show that z− is real proportional toz ¯ and find the proportionality coefficient.  48. Find all z satisfying the equations: z 1 = z +1 = 2.  | − | | | 49. Sketch the solution set to the following system of inequalities: z 1 1, z 1, Re(iz) 0. | − |≤ | |≤ ≤ 50. Compute absolute values and arguments of (a) 1 i, (b) 1 i√3.  100 − − √3+i  51. Compute 2 . 52. Express cos 3θ and sin 3θ in terms of cos θ and sin θ. 

53. Express cos(θ1 + θ2) and sin(θ1 + θ2) in terms of cos θi and sin θi. 7 54. Prove B´ezout’s theorem : A number z0 is a root of a polynomial P in one variable z if and only if P is divisible by z z .  − 0 55. Find roots of degree 2 : z2 4z +5, z2 iz +1, z2 2(1 + i)z +2i, z2 2z + i√3.  − − − − 56. Find all roots of polynomials: z3 +8, z3 + i, z4 +4z2 +4, z4 2z2 +4, z6 +1.  − 57. Prove that every polynomial with real coefficients factors into the prod- uct of polynomials of degree 1 and 2 with real coefficients.  58. Prove that the sum of all 5th roots of unity is equal to 0.  59.⋆ Find general Vieta’s formulas 8 expressing all coefficients of a poly- nomial in terms of its roots.  7Named after Etienne´ B´ezout (1730–1783). 8Named after Fran¸cois Vi´ete (1540–1603) also known as Franciscus Vieta. 4. Problems of Linear Algebra 23 4 Problems of Linear Algebra

One of our goals in this book is to equip the reader with a unifying view of Linear Algebra, or at least of what is studied under this name in traditional university courses. Following this mission, we give here a preview of the subject and describe its main achievements in lay terms. To begin with a few words of praise: Linear Algebra is a very simple and useful subject, underlying most of other areas of math- ematics, as well as its applications to physics, engineering, and eco- nomics. What makes Linear Algebra useful and efficient is that it provides ultimate solutions to several important mathematical prob- lems. Furthermore, as should be expected of a truly fruitful mathe- matical theory, the problems it solves can be formulated in a rather elementary language and make sense even before any advanced ma- chinery is developed. Even better, the answers to these problems can also be described in elementary terms (in contrast with the jus- tification of those answers, which better be postponed until adequate tools are developed). Finally, those several problems we are talking about are similar in their nature; namely, they all have the form of problems of classification of very basic mathematical objects. Be- fore presenting explicitly the problems and the answers, we need to discuss the general idea of classification in mathematics.

Classifications in Mathematics

Classifications are intended to bring order into seemingly complex or chaotic matters. Yet, there is a major difference between, say, our classification of quadratic curves and Carl Linnaeus’ Systema Naturae. For two quadratic curves to be in the same class, it is not enough that they share a number of features. What is required is a transfor- mation of a prescribed type that would transform one of the curves into the other, and thus make them equivalent in this sense, i.e. up to such transformations. What types of transformations are allowed (e.g., changes to ar- bitrary new coordinate systems, or only to Cartesian ones) may be a matter of choice. With every choice, the classification of objects of a certain kind (i.e. quadratic curves in our example) up to trans- formations of the selected type becomes a well-posed mathematical problem. 24 Chapter 1. A CRASH COURSE A complete answer to a classification problem should consist of – a list of normal (or canonical) forms, i.e. representatives of the classes of equivalence, and – a classification theorem establishing that each object of the kind (quadratic curve in our example) is equivalent to exactly one of the normal forms, i.e. in other words, that (i) each object can be transformed into a normal form, and (ii) no two normal forms can be transformed into each other. Simply put, Linear Algebra deals with classifications of linear and/or quadratic equations, or systems of such equations. One might think that all that equations do is ask: Solve us! Unfortunately this attitude toward equations does not lead too far. It turns out that very few equations (and kinds of equations) can be explicitly solved, but all can be studied and many classified. The idea is to replace a given “hard” (possibly unsolvable) equa- tion with another one, the normal form, which should be chosen to be as “easy” as it is possible to find in the same equivalence class. Then the normal form should be studied (and hopefully “solved”) thus providing information about the original “hard” equation. What sort of information? Well, any sort that remains invariant under the equivalence transformations in question. For example, in classification of quadratic curves up to changes of Cartesian coordinate systems, all equivalent ellipses are indistin- guishable from each othergeometrically (in particular, they have the same semiaxes) and differ only by the choice of a Cartesian coordi- nate system. However, if arbitrary rescaling of coordinates is also allowed, then all ellipses become indistinguishable from circles (but still different from hyperbolas, parabolas, etc.) Whether a classification theorem really simplifies the matters, depends on the kind of objects in question, the chosen type of equiv- alence transformations, and the applications in mind. In practice, the problem often reduces to finding sufficiently simple normal forms and studying them in great detail. The subject of Linear Algebra fits well into the general philoso- phy just outlined. Below we formulate four classification problems and respective answers. Together with a number of variations and applications, which will be presented later in due course, they form what is usually considered the main course of Linear Algebra. 4. Problems of Linear Algebra 25 The Theorem Question. Given m linear functions in n variables,

y1 = a11x1 + ... + a1nxn ... , ym = am1x1 + ... + amnxn what is the simplest form to which they can be transformed by linear changes of the variables,

y1 = b11Y1 + ... + b1mYm x1 = c11X1 + ... + c1nXn ... , ... ? ym = bm1Y1 + ... + bmmYm xn = cn1X1 + ... + cnnXn

Theorem. Every system of m linear functions in n vari- ables can be transformed by suitable linear changes of de- pendent and independent variables to exactly one of the nor- mal forms:

Y1 = X1, ..., Yr = Xr, Yr+1 = 0, ..., Ym = 0, where 0 r m,n. ≤ ≤ The number r featuring in the answer is called the rank of the given system of m linear functions.

The Inertia Theorem Question. Given a quadratic form (i.e. a homogeneous quadratic function) in n variables,

2 2 Q = q11x1 + 2q12x1x2 + 2q13x1x3 + ... + qnnxn, what is the simplest form to which it can be transformed by a linear change of the variables

x1 = c11X1 + ... + c1nXn ... ? xn = cn1X1 + ... + cnnXn

Theorem. Every quadratic form in n variables can be transformed by a suitable linear change of the variables to exactly one of the normal forms: X2 + ... + X2 X2 ... X2 where 0 p + q n. 1 p − p+1 − − p+q ≤ ≤ 26 Chapter 1. A CRASH COURSE

The numbers p and q of positive and negative squares in the normal form are called inertia indices of the quadratic form in question. If the quadratic form Q is known to be positive everywhere outside the origin, the Inertia Theorem tells us that in a suitable 2 2 coordinate system Q assumes the form X1 + ... + Xn, i.e. its inertia indices are p = n, q = 0.

The Orthogonal Diagonalization Theorem Question. Given two homogeneous quadratic forms in n variables, Q(x1, ..., xn) and S(x1, ..., xn), of which the first one is known to be positive everywhere outside the origin, what is the simplest form to which they can be simultaneously transformed by a linear change of the variables? Theorem. Every pair Q, S of quadratic forms in n vari- ables, of which Q is positive everywhere outside the origin, can be transformed by a linear changes of the variables to exactly one of the normal forms

Q = X2 + ... + X2, S = λ X2 + ... + λ X2, where λ ... λ . 1 n 1 1 n n 1 ≥ ≥ n

The real numbers λ1,...,λn are called eigenvalues of the given pair of quadratic forms (and are often said to form their spectrum).

The Jordan Canonical Form Theorem The fourth question deals with a system of n linear functions in n variables. Such an object is the special case of systems of m func- tions in n variables when m = n. According to the Rank Theo- rem, such a system of rank r n can be transformed to the form ≤ Y1 = X1,...,Yr = Xr,Yr+1 = = Yn = 0 by linear changes of de- pendent and independent variables.· · · There are many cases however where relevant information about the system does not stay invariant when dependent and independent variables are changed separately. This happens whenever both groups of variables describe objects in the same space (rather than in two different ones). An important class of examples comes from the theory of Ordi- nary Differential Equations (for short: ODE). Such equations (e.g. x˙ = ax) relate values of quantities with the rates of their change in 4. Problems of Linear Algebra 27 time. Transforming the variables (e.g. by rescaling: x = cy) would make little sense if not accompanied with the simultaneous change of the rates (x ˙ = cy˙). We will describe the fourth classification problem in the context of the ODE theory. Question.Given a system of n linear homogeneous 1st order con- stant coefficient ODEs in n unknowns:

x˙ 1 = a11x1 + ... + a1nxn ... , x˙ n = an1x1 + ... + annxn what is the simplest form to which it can be transformed by a linear change of the unknowns:

x1 = c11X1 + ... + c1nXn ... ? xn = cn1X1 + ... + cnnXn

Due to the Fundamental Theorem of Algebra, there is an advan- tage in answering this question over complex numbers, i.e. assuming that the coefficients cij in the change of variables, as well as the coefficients aij of the given ODE system are allowed to be complex numbers. Example. Consider a single mth order linear ODE of the form: d ( λ)my = 0, where λ C. dt − ∈ By setting d d d y = x , y λy = x , ( λ)2y = x , ..., ( λ)m−1y = x , 1 dt − 2 dt − 3 dt − m the equation can be written as the following system of m ODEs of the 1st order:

x˙ 1 = λx1 + x2 x˙ 2 = λx2 + x3 ... . x˙ m−1 = λxm−1 + xm x˙ m = λxm

Let us call this system the Jordan cell of size m with the eigen- value λ. Introduce a Jordan system of several Jordan cells of sizes 28 Chapter 1. A CRASH COURSE m1, ..., mr with the eigenvalues λ1, ..., λr. It can be similarly com- pressed into the system

d d ( λ )m1 y = 0, ..., ( λ )mr y = 0 dt − 1 1 dt − r r of r unlinked ODEs of the orders m1, ..., mr. Theorem. Every constant coefficient system of n linear 1st order ODEs in n unknowns can be transformed by a complex linear change of the unknowns to exactly one (up to reordering of the cells) of the Jordan systems with m1 + ... + mr = n.

Fools and Wizards In the rest of the text we will undertake a more systematic study of the four basic problems and prove the classification theorems stated here. The reader should be prepared however to meet the following three challenges of Chapter 2. Firstly, one will find there much more diverse material than what has just been described. This is because many mathematical ob- jects and classification problems about them can be reduced (speak- ing roughly or literally) to the four problems discussed above. The challenge is to learn how to recognize situations where results of Lin- ear Algebra can be helpful. Many of those objects will be introduced in the opening section of Chapter 2. Secondly, we will encounter one more fundamental result of Lin- ear Algebra, which is not a classification, but an important (and beautiful) formula. It answers the question which substitutions of the form x1 = c11X1 + ... + c1nXn ... xn = cn1X1 + ... + cnnXn are indeed changes of the variables and can therefore be inverted by expressing X1, ..., Xn linearly in terms of x1, ..., xn, and how to de- scribe such inversion explicitly. The answer is given in terms of 2 the , a remarkable function of n variables c11, ..., cnn, which will also be studied in Chapter 2. Thirdly, Linear Algebra has developed an adequate language, based on the abstract notion of . It allows one to represent relevant mathematical objects and results in ways much 4. Problems of Linear Algebra 29 less cumbersome and thus more efficient than those found in the pre- vious discussion. This language is introduced at the end of Chapter 2. The challenge here is to get accustomed to the abstract way of thinking. Let us describe now the principle by which our main four themes are grouped in Chapters 3 and 4. Note that Jordan canonical forms and the normal forms in the Orthogonal Diagonalization Theorem do not form discrete lists, but instead depend on continuous parameters — the eigenvalues. Based on experience with many mathematical classifications, it is consid- ered that the number of parameters on which equivalence classes in a given problem depend, is the right measure of complexity of the clas- sification problem. Thus, Chapter 3 deals with simple problems of Linear Algebra, i.e. those classification problems where equivalence classes do not depend on continuous parameters. Respectively, the non-simple problems are studied in Chapter 4. Finally, let us mention that the proverb: Fools ask questions that wizards cannot answer, fully applies in Linear Algebra. In addition to the four basic problems, there are many similarly looking ques- tions that one can ask: for instance, to classify triples of quadratic forms in n variables up to linear changes of the variables. In fact, in this problem, the number of parameters, on which equivalence classes depend, grows with n at about the same rate as the number of parameters on which the three given quadratic forms depend. We will have a chance to touch upon such problems of Linear Algebra in the last section, in connection with quivers. The modern attitude toward such problems is that they are unsolvable.

EXERCISES 60. Classify quadratic forms Q = ax2 in one variable with complex coeffi- cients (i.e. a C up to complex linear changes: x = cy, c C,c = 0.  ∈ ∈ 6 61. Using results of Section 2, derive the Inertia Theorem in dimension n = 2. 62. Show that classification of real quadratic curves up to arbitrary lin- ear inhomogeneous changes of coordinates consists of 8 equivalence classes. Show that if the coordinate systems are required to remain Cartesian, then there are infinitely many equivalence classes, which depend on 2 continuous parameters. 63. Is there any difference between classification of quadratic equations in two variables F (x, y) = 0 up to linear inhomogeneous changes of the vari- ables and multiplications of the equations by non-zero constants, and the 30 Chapter 1. A CRASH COURSE classification of quadratic curves, i.e. sets (x, y) F (x, y)=0 of solutions { | } to such equations, up to the same type of coordinate transformations?  64. Prove the Rank Theorem on the line, i.e. show that every linear func- tion y = ax can be transformed to exactly one of the normal forms: Y = X or Y = 0, by the changes: y = bY, x = cX, where b =0 = c. 6 6 65. For the pair of linear functions describing a rotation of the plane (see the formula ( ) in Section 2), find the normal form to which they can be ∗ transformed according to the Rank Theorem.  2 2 66. Show that X1 + ... + Xn is the only one of the normal forms of The Inertia Theorem which is positive everywhere outside the origin. 67.⋆ In the Inertia Theorem with n = 2, show that that there are six normal forms, and prove that they are pairwise non-equivalent. 

68. Sketch the surfaces Q(X1,X2,X3) = 0 for all normal forms in the Inertia Theorem with n = 3. 69. How many equivalence classes are there in the Inertia Theorem for quadratic forms in n variables?  70.⋆ Prove the Orthogonal Diagonalization Theorem for n = 2 using results of Section 2.  71. Classify ODEsx ˙ = ax up to the changes x = cy, c = 0.  λt m 1 6 72. Verify that y(t) = e c0 + tc1 + ... + cm 1t − , where ci C are − ∈ arbitrary constants, is the general solution to the ODE ( d λ)my =0.  dt m m m k m− k 73. Using the binomial formula (a + b) = k=0 k a b − , show that the Jordan cell of size m with the eigenvalue λ can be written as the m-th  order ODE P (m) m (m 1) m 1 m m 1 m y 1 λy − + + ( 1) − m 1 λ − y′ + ( 1) y = 0. − · · · − − − ⋆ m 74. Prove that  the binomial coefficient k “m choose k” is equal to the number of k-element subsets in a set of m elements.  ⋆ m m!  75. Prove that k = k!(m k)! . − 76. Rewrite the pendulum equationx ¨ = x as a system.   − 77.⋆ Identify the Jordan form of the systemx ˙ = x , x˙ = x .  1 2 2 − 1 78.⋆ Solve the following Jordan systems and show that they are not equiv- alent to each other: x˙ = λx y˙ = λy + y 1 1 and 1 1 2 .  x˙ 2 = λx2 y˙2 = λy2

79. How many arbitrary coefficients are there in a quadratic form in n variables?  80.⋆ Show that equivalence classes of triples of quadratic forms in n vari- ables must depend on at least n2/2 parameters.  Chapter 2

Dramatis Personae

1 Matrices

Matrices are rectangular arrays of numbers. An m n- ×

a11 ... a1n A = ... aij . . .   am1 ... amn   has m rows and n columns. The matrix entry aij is positioned in row i and column j. We usually assume that the matrix entries are taken from either of the fields R or C of real or complex numbers respectively. These two choices are sufficient for all our major goals. However all we use is basic properties of addition and multiplication of numbers. Thus everything we say in this section (and in the next section about determinants) works well when matrix entries are taken from any field (or even any with unity, e.g. the ring Z of all , where multiplication is commutative, but division by non- zero numbers is not always defined). Various choices of a field will be discussed at the beginning of Section 3 on vector spaces. Matrices are found in Linear Algebra all over the place. Yet, the main point of this section is that matrices per se are not objects of Linear Algebra. Namely, the same matrix A can represent different mathematical objects, and will behave differently depending on what kind of object is meant by it.

31 32 Chapter 2. DRAMATIS PERSONAE Vectors The standard convention is that vectors are represented by columns of their coordinates. Thus, λx and x+y are expressed in coordinates by the following operations with 3 1-matrices: × x1 λx1 x1 y1 x1 + y1 λ x2 = λx2 , x2 + y2 = x2 + y2 .           x3 λx3 x3 y3 x3 + y3           More generally, one refers to n 1-matrices as coordinate vectors, which can be term-wise multiplied× by scalars or added. Such vectors form the coordinate space Rn (or Cn, or Qn, if the matrix entries and the scalars are taken to be complex or rational numbers).

Linear Functions A linear function (or linear form) on the coordinate space has the form a(x)= a x + + a x 1 1 · · · n n and is determined by the row [a1,...,an] of its coefficients. Linear combinations λa + µb of linear functions are linear functions. Their coefficients are expressed by linear combinations of 1 n-matrices: × λ[a1,...,an]+ µ[b1,...,bn] = [λa1 + µb1,...,λan + µbn]. The operation of evaluation, i.e. taking the value a(x) of a lin- ear function on a vector, leads to the simplest instance of matrix product, namely the product of a 1 n row with an n 1 column: × × x1 a(x) = [a ,...,a ] . . . = a x + + a x . 1 n   1 1 · · · n n xn   Note that a(λx + µy)= λa(x)+ µa(y) for all vectors x, y and scalars λ, µ. This can be viewed as a manifestation of the distributive law for matrix product. As we will see later, it actually expresses the very property of linearity, namely, the property of linear functions to respect operations with vectors, i.e. to assign to linear combinations of vectors linear combinations of respective values with the same coefficients. 1. Matrices 33 Linear Maps

An m-tuple of linear functions a1,..., am in n variables defines a A : Rn Rm. Namely, to a vector x Rn it associates the vector y = Ax →Rm whose m components are∈ computed as ∈ y = a (x)= a x + + a x , i = 1,...,m. i i 11 1 · · · in n

Whereas each yi is given by the product of a row with a column, the whole linear map can be described as the product of the m n-matrix A with the column: ×

y1 a11 ... a1n x1 y = . . . = ...... = Ax.       ym am1 ... amn xn       The rows of the matrix A represent the m linear functions a1,..., am. To interpret the columns, note that every vector x can be uniquely written as a linear combination x = x e + +x e of the standard 1 1 · · · n n basis vectors e1,..., en:

x1 1 0 . . . 0 . . . = x + + x .  . . .  1  . . .  · · · n  0  x 0 1  n            Respectively, Ax = x1Ae1 + + xnAen. The n columns of the · · · m matrix A are exactly Aej, j = 1,...,n, i.e. the images in R of the standard basis vectors from Rn under the linear map A. Given linear maps B : Rn Rm and A : Rm Rl, we can form their composition Rn Rl →by substituting y =→Bx into z = Ay. The corresponding operation→ with the matrices A and B leads to the general notion of matrix product C = AB defined whenever the number of columns of A coincides with the number of rows of B:

b ... b c ... c a .... a 11 1n 11 1n 11 1m ...... c . . . = ...... ij  ......   c ... c   a .... a  l1 ln l1 lm b ... b  m1 mn        By definition of composition, the entry cij located at the intersection of the ith row and jth column of C is the value of the linear function a on the image Be Rm of the standard basis vector e Rn. i j ∈ j ∈ 34 Chapter 2. DRAMATIS PERSONAE

Since ai and Bej are represented by the ith row and jth column respectively of the matrices A and B, we find:

b1j c = [a ,...,a ] . . . = a b + + a b . ij i1 im   i1 1j · · · im mj bmj   In other words, cij is the product of the ith row of A with the jth column of B. Based on this formula, it is not hard to verify that the matrix product is associative, i.e. (AB)C = A(BC), and satisfies the left and right distributive laws: P (λQ + µR) = λP Q + µP R and (λX + µY )Z = λXZ + µY Z, whenever the sizes of the matrices are right. However, one of the key points in mathematics is that there is no point in making such verifications. The operations with matri- ces encode in the coordinate form meaningful operations with linear maps, and the properties of matrices simply reflect those of the maps. For instance, matrix product is associative because composition of arbitrary (and not only linear) maps is associative. We will have a chance to discuss properties of linear maps in the more general con- text of abstract vector spaces, where coordinate expressions may not even be available.

Changes of Coordinates

n Suppose that standard coordinates (x1,...,xn) in R are expressed ′ ′ in terms of new variables (x1,...,xn) by means of n linear functions: ′ x1 c11 ... c1n x1 . . . = ......      ′  xn cn1 ... cnn xn In matrix notation, this can be written asx= Cx′ where C is the of size n. We call this a linear change of coor- dinates, if conversely, x′ can be expressed by linear functions of x. In other words, there must exist a square matrix D such that the substitutions x′ = Dx and x = Cx′ are inverse to each other, i.e. x = CDx and x′ = DCx′ for all x and x′. It is immediate to see that this happens exactly when the matrices C and D satisfy CD = I = DC where I is the of size n: 1 0 . . . 0 0 1 . . . 0 I = .  ......  0 . . . 0 1     1. Matrices 35 When this happens, the square matrices C and D are called inverse to each other, and one writes: D = C−1, C = D−1. The rows of C express old coordinates xi as linear functions of new coordinates, ′ xj. The columns of C represent, in the old coordinate system, the vectors which in the new coordinate system serve as standard basis ′ vectors ej. The matrix C is often called the transition matrix between the coordinate systems. In spite of apparent simplicity of this notion, it is easy to get lost here. As we noticed in connection with rotations on the complex plane, a change of coordinates is easy to confuse with a linear map from the space Rn to itself. We will reserve for such maps the term linear transformation. Thus, the formula x′ = Cx describes a lin- ear transformation, that associates to a vector x Rn a new vector x′ written in the same coordinate system. The inverse∈ transfor- mation, when exists, is given by the formula x = C−1x′. Let us now return to changes of coordinates and examine how they affect linear functions and maps. Making the change of variables x = Cx′ in a linear function a results in the values a(x) of this function being expressed as a′(x′) in terms of new coordinates of the same vectors. Using the matrix product notation, we find: a′x′ = ax = a(Cx′) = (aC)x′. Thus, coordinates of vectors and coefficients of linear functions are trans- formed differently: x = Cx′, or x′ = C−1x, but a′ = aC, or a = a′C−1. Next, let y = Ax be a linear map from Rn to Rm, and let x = Cx′ and y = Dy′ be changes of coordinates in Rn and Rm respectively. Then in new coordinates the same linear map is given by the new formula y′ = A′x′. We compute A′ in terms of A, C and D: Dy′ = y = Ax = ACx′, i.e. y′ = D−1ACx′, and hence A′ = D−1AC. In particular, if x Ax is a linear transformation on Rn (i.e. — we remind — a linear7→ map from Rn to itself), and a change of co- ordinates x = Cx′ is made, then in new coordinates the same linear transformation is x′ A′x′, where 7→ A′ = C−1AC, i.e. in the the previous rule we need to take D = C. This is because the same change applies to both: the input vector x and its image Ax. The operation A C−1AC over a square matrix A is often called the similarity transformation7→ by the C. 36 Chapter 2. DRAMATIS PERSONAE Bilinear Forms The dot product of two coordinate vectors x, y Rn is defined by the formula ∈ x, y = x y + + x y . h i 1 1 · · · n n It is an example of a bilinear form, i.e. a function of an ordered pair (x, y) of vector variables, which is a linear function of each of them. In general, the vectors may come from different spaces, x Rm and y Rn. We can write a bilinear form as a linear function of∈x whose ∈ i coefficients are linear functions of yj:

m n m n B(x, y)= xi bijyj = xibijyj. Xi=1 Xj=1 Xi=1 Xj=1 Thus the bilinear form B is determined by the m n-matrix of its × coefficients bij. Slightly abusing notation, we will denote this matrix with the same letter B as the bilinear form. Here is the meaning of the matrix entries: bij = B(ei, fj), the value of the form on the pair of standard basis vectors, e Rm and f Rn. i ∈ j ∈ With a bilinear form B of x and y, one associates another bilinear form called transposed to B, which is a function of y and x (in this order!), denoted Bt, and defined by Bt(y, x)= B(x, y). Explicitly:

n m m n t t yibijxj = B (y, x)= B(x, y)= xjbjiyi. Xi=1 Xj=i Xj=1 Xi=i We conclude that bt = b . In other words, the n m-matrix of co- ij ji × efficients of Bt is obtained from the m n-matrix B by the operation of transposition: ×

b b b ... b 11 n1 11 1n ...... B = ... , Bt = .  ......   b ... b  m1 mn b b  1m nm      Note that transposition transforms rows into columns, thus messing up our convention to represent vectors by columns and linear func- tions by rows. Taking advantage of this, we can compute values of bilinear forms using matrix product and transposition:

B(x, y)= xtBy, Bt(y, x)= ytBtx. 1. Matrices 37 In particular, when x, y Rn, we have: x, y = xty. ∈ h i To find out how changes of variables x = Dx′, y = Cy′ transform the matrix of a bilinear form, take B′(x′, y′) = B(Dx′,Cy′), the ′ ′ value B(x, y) written in new coordinates. We have: bij = B (ei, fj)= t t B(Dei,Cfj) = (Dei) B(Cfj). Note that (Dei) is the transposed ith t column of D, i.e. the ith row of D , and B(Cfj) is the jth column of BC. Combining these results for all i and j, we find:

B′ = DtBC.

Were the same matrix B representing a linear map, the transforma- tion law would’ve been different: B D−1BC. 7→ Consider now the case when both vector inputs x, y of a bilinear form B lie in the same space Rn. Then the transposed form Bt can be evaluated on the same pair of vectors (x, y) (in this order!) and the result B(y, x) compared with B(x, y). The form B is called symmetric if Bt = B, i.e. B(y, x) = B(x, y) for all x, y. It is called anti-symmetric if Bt = B, i.e. B(y, x)= B(x, y) for all x, y. Every bilinear form (of two− vectors from the same− space) can be uniquely written as the sum of a symmetric and an anti-symmetric bilinear forms: B + Bt B Bt B = + − . 2 2

Quadratic Forms To a bilinear form B on Rn, one associates a function on Rn by substituting the same vector x Rn for both inputs: ∈ n n B(x, x)= bijxixj. Xi=1 Xj=1 The result is a quadratic form, i.e. a degree-2 homogeneous poly- nomial in n variables (x1,...,xn). Since bij can be arbitrary, all quadratic forms are obtained this way. Moreover, if B is written as the sum S + A of symmetric and anti-symmetric forms, then A(x, x) = A(x, x) = 0, and B(x, x) = S(x, x), i.e. the the quadratic form depends− only on the symmetric part of B. Further abusing our notation, we will write S(x) for the values S(x, x) of the quadratic form corresponding to the bilinear form S. 38 Chapter 2. DRAMATIS PERSONAE The correspondence between symmetric bilinear forms and quadratic forms is not only “onto” but is also “one-to-one,” i.e. the values S(x, y) of a symmetric bilinear form can be reconstructed from the values of the corresponding quadratic form. Explicitly: S(x + y, x + y)= S(x, x)+ S(x, y)+ S(y, x)+ S(y, y). Due to the symmetry S(x, y)= S(y, x), we have: 2S(x, y)= S(x + y) S(x) S(y). − − The transformation law for coefficients of a quadratic form under the changes of variables x = Cx′ is S CtSC. Here S = St denotes the symmetric matrix of coefficients7→ of the quadratic form

n n S(x)= sijxixj, sij = sji for all i, j. Xi=1 Xj=1

EXERCISES 81. Are the functions 3x, x3, x+1, 0, sin x, (1+x)2 (1 x)2, tan arctan x, − − arctan tan x linear? 

82. Check the linearity property of functions a(x) = aixi, i.e. that a(λx + µy)= λa(x)+ µa(y). P 83. Prove that every function on Rn that possesses the linearity property has the form aixi.  84. Let v Rm, and let a : Rn R be a linear function. Define a linear ∈ P → map E : Rn Rm by E(x)= a(x)v, and compute the matrix of E.  → 85. Write down the matrix of rotation through the angle θ in R2.  86. Let 1 2 1 23 1 2 A = , B = , C = 2 1 . 1 1 1 3 4 − − " 0 1 #     − Compute those of the products ABC,BAC,BCA,CBA,CAB,ACB which are defined.  n 87. Let j=1 aij xj = bi, i =1,...,m, be a system of m linear equations in n unknowns (x1,...,xn). Show that it can be written in the matrix form Ax = b,P where A is a linear map from Rn to Rm.

88. A square matrix A is called upper triangular if aij = 0 for all i < j and lower triangular if aij = 0 for all i > j. Prove that products of upper triangular matrices are upper triangular and products of lower triangular matrices are lower triangular. 1. Matrices 39 89. For an identity matrix I, prove AI = A and IB = B for all allowed sizes of A and B. 90. For a square matrix A, define its powers Ak for k > 0 as A A (k 1 1 · · · times), for k =0 as I, and for k< 0 as A− A− (k times), assuming A · · · invertible. Prove that A that AkAl = Ak+l for all k,l. 19 cos 19◦ sin 19◦ 91.⋆ Compute − .  sin 19◦ cos 19◦   92. Compute powers Ak, k =0, 1, 2,... , of the square matrix A all of whose entries are zeroes, except that ai,i+1 = 1 for all i.  93. For which sizes of matrices A and B, both products AB and BA: (a) are defined, (b) have the same size?  94.⋆ Give examples of matrices A and B which do not commute, i.e. AB = 6 BA, even though both products are defined and have the same size.  95. When does (A + B)2 = A2 +2AB + B2?  96. A square matrix A is called diagonal if a = 0 for all i = j. Which ij 6 diagonal matrices are invertible?  97. Prove that an inverse of a given matrix is unique when exists.  98. Let A, B be invertible n n-matrices. Prove that AB is also invertible, 1 1 1 × and (AB)− = B− A− .  99.⋆ If AB is invertible, does it imply that A, B, and BA are invertible?  100.⋆ Give an example of matrices A and B such that AB = I, but BA = I. 6 101. Let A : Rn Rm be an invertible linear map. (Thus, m = n, but we → consider Rn and Rm as two different copies of the coordinate space.) Prove that after suitable changes of coordinates in Rn and Rm, the matrix of this transformation becomes the identity matrix I.  102. Is the function xy: linear? bilinear? quadratic?  103. Find the coefficient matrix of the dot product.  104. Prove that all anti-symmetric bilinear forms in R2 are proportional to each other.  2 105. Represent the bilinear form B =2x1(y1 + y2) in R as the sum S + A of symmetric and anti-symmetric ones.  106. Find the symmetric bilinear forms corresponding to the quadratic forms (x + + x )2 and x x .  1 · · · n i

Definition Let A be a square matrix of size n:

a11 ... a1n A = ...   an1 ... ann   Its determinant is a scalar det A defined by the formula

det A = ε(σ)a1σ(1)a2σ(2)...anσ(n). σ X Here σ is a permutation of the indices 1, 2, ..., n. A permuta- tion σ can be considered as an invertible function i σ(i) from the set of n elements 1, ..., n to itself. We use the7→ functional notation σ(i) in order to{ specify} the i-th term in the permutation 1 ... n σ = σ(1) ... σ(n) . Thus, each elementary product in the determinant formula contains exactly one matrix entry from each row, and these entries are chosen from n different columns. The sum is taken over all n! ways of making such choices. The coefficient ε(σ) in front of the elementary product equals 1 or 1 and is called the sign of the permutation σ. − We will explain the general rule of the signs after a few examples. In these examples, we begin using one more conventional notation for determinants. According to it, a square array of matrix entries placed between two vertical bars denotes the determinant of the matrix. a b a b Thus, denotes a matrix, but denotes a number c d c d   equal to the determinant of that matrix.

Examples. (1) For n = 1, the determinant a = a . | 11| 11 a11 a12 (2) For n = 2, we have: = a11a22 a12a21. a21 a22 −

(3) For n = 3, we have 3! = 6 summands

a11 a12 a13 a a a = 21 22 23 a31 a32 a33

a11a22a33 a12a21a33 + a 12a23a31 a13a22 a31 + a13a21a32 a11a23a32 − 123− 123 123 123 −123 123 corresponding to permutations 123 , 213 , 231 , 321 , 312 , 132 .       42 Chapter 2. DRAMATIS PERSONAE The rule of signs for n = 3 is schematically shown on Figure 21.

Figure 21 Parity of Permutations The general rule of signs depends on properties of permutations. We say that σ inverses a pair of indices i < j if σ(i) > σ(j). The total number l(σ) of pairs i < j that σ inverses is called the length of the permutation σ. Permutations are called even or odd depending on their lengths being respectively even or odd. We put ε(σ) := ε(σ) = ( 1)l(σ) (i.e. ε(σ) = 1 for even and 1 for odd permutations. − − Examples. (1) If l(σ) = 0, then σ(1) < σ(2) < < σ(n), and hence σ = id, the identity permutation. In particular,· · · ε(id) = 1. (2) Consider a transposition τ, i.e. a permutation that swaps two indices, say i < j, leaving all other indices in their respective places. Then τ(j) < τ(i), i.e. τ inverses the pair of indices i < j. Besides, for every index k such that iσ(i + 1)  − In particular, σ and σ′ have opposite parities. ′ 1 ... i i + 1 ... n Proof. Indeed, σ = σ(1) ... σ(i + 1) σ(i) ... σ(n) , i.e. σ′ is obtained from σ by the extra swap of σ(i) and σ(i + 1). This swap does not affect monotonicity of any pairs of entries, except the monotonicity of σ(i),σ(i + 1), which is reversed.  2. Determinants 43 Several corollaries follow immediately. Corollary 1. Every permutation σ, of length l> 0, can be represented as a composition τ (i1) ...τ (il) of l transpositions of length 1. Indeed, locating a pair of nearby indices i < i + 1 inverted by σ and composing σ with τ (i) as in Lemma, we obtain a permutation σ′ of length l 1. Continuing the same way with σ′, we after l steps arrive at− the permutation of length 0, which is the identity permutation (Example (1)). Reversing the order of transpositions, we obtain the required result. Corollary 2. If a permutation is represented as compo- sition of transpositions of length 1, then the parity of the number of these transpositions coincides with the parity of the permutation. Indeed, according to the lemma, precomposing with each trans- position of length 1 reverses the parity of the permutation. Since the identity permutation is even, the number of factors τ (i) in a compo- sition must be even for even permutations and odd for odd. Corollary 3. The sign of permutations is multiplicative with respect to compositions: ε(σσ′)= ε(σ)ε(σ′). Indeed, representing each σ and σ′ as a composition of transpo- sitions of length 1 and concatenating them, we obtain such a repre- sentation for the composition σσ′. The result follows from Corollary 2, since the number (and hence parity) of factors behaves additively under concatenation. Corollary 2′. If a permutation is represented as a com- position of arbitrary transpositions, then the parity of the number of these transpositions coincides with the parity of the permutation. Indeed, according to Example (2), every transposition is odd, and hence ε(τ τ )= ε(τ ) ε(τ ) = ( 1)N . 1 · · · N 1 · · · N − Here are some illustrations of the above properties in connection with the definition of determinants. Examples. (4) The transposition (21) is odd. That is why the term a12a21 occurs in 2 2-determinants with the negative sign. × 123 123 123 123 123 123 (5) The permutations 123 , 213 , 231 , 321 , 312 , 132 have lengths l = 0, 1, 2, 3, 2, 1 and respectively signs ε = 1, 1, 1, 1, 1, 1 (thus explaining Figure 21). Notice that each next permutat−  −ion here− is obtained from the previous one by an extra flip. 44 Chapter 2. DRAMATIS PERSONAE 1234 (6) The permutation 4321 inverses all the 6 pairs of indices and has therefore length l = 6. Thus the elementary product a14a23a32a41 occurs with the sign ε = ( 1)6 = +1 in the definition of 4 4- determinants. − × (7) Permutations σ and σ−1 inverse to each other have the same parity since their composition σσ−1 = σ−1σ = id is even. This shows that the definition of determinants can be rewritten “by columns:”

det A = ε(σ)aσ(1)1...aσ(n)n. σ X Indeed, each summand in this formula is equal to the summand in the original definition corresponding to the permutation σ−1, and vice versa. Namely, reordering the factors aσ(1)1...aσ(n)n, so that σ(1),...,σ(n) increase monotonically, yields a1σ−1(1)...anσ−1(n).

Properties of determinants (i) Transposed matrices have equal determinants:

det At = det A.

This follows from the last Example. Below, we will think of an n n matrix as an array A = [a1,..., an] of its n columns of size n (vectors× from Cn if you wish) and formulate all further properties of determinants in terms of columns. The same properties hold true for rows, since the transposition of A changes columns into rows without changing the determinant. (ii) Interchanging any two columns changes the sign of the determinant:

det[..., a , ..., a , ...]= det[..., a , ..., a , ...]. j i − i j Indeed, the operation replaces each permutation in the definition of determinants by its composition with the transposition of the in- dices i and j. Thus changes the parity of the permutation, and thus reverses the sign of each summand. Rephrasing this property, one says that the determinant, consid- ered as a function of n vectors a1,..., an is totally anti-symmetric, i.e. changes the sign under every odd permutation of the vectors, and stays invariant under even. It implies that a matrix with two equal columns has zero determinant. It also allows one to formulate further 2. Determinants 45 column properties of determinants referring to the 1st column only, since the properties of all columns are alike. (iii) Multiplication of a column by a number multiplies the determinant by this number:

det[λa1, a2, ..., an]= λ det[a1, a2, ..., an].

Indeed, this operation simply multiplies each of the n! elementary products by the factor of λ. This property shows that a matrix with a zero column has zero determinant. (iv) The determinant function is additive with respect to each column:

′ ′′ ′ ′′ det[a1 + a1, a2, ..., an] = det[a1, a2, ..., an] + det[a1, a2, ..., an].

Indeed, each elementary product contains exactly one factor picked from the 1-st column and thus splits into the sum of two ele- ′ ′′ mentary products aσ(1)1aσ(2)2...aσ(n)n and aσ(1)1aσ(2)2...aσ(n)n. Sum- ming up over all permutations yields the sum of two determinants on the right hand side of the formula. The properties (iv) and (iii) together mean that the determinant function is linear with respect to each column separately. Together with the property (ii), they show that adding a multiple of one column to another one does not change the determinant of the matrix. Indeed,

a + λa , a , ... = a , a , ... + λ a , a , ... = a , a , ... , | 1 2 2 | | 1 2 | | 2 2 | | 1 2 | since the second summand has two equal columns. The determinant function shears all the above properties with the identically zero function. The following property shows that these functions do not coincide. (v) det I = 1. Indeed, since all off-diagonal entries of the identity matrix are zeroes, the only elementary product in the definition of det A that survives is a11...ann = 1. The same argument shows that the determinant of any diagonal matrix equals the product of the diagonal entries. It is not hard to generalize the argument in order to see that the determinant of any 46 Chapter 2. DRAMATIS PERSONAE upper or lower is equal to the product of the diago- nal entries. One can also deduce this from the following property valid for block triangular matrices. A B Consider an n n-matrix subdivided into four blocks × C D A,B,C,D of sizes m m, m  l, l mand l l respectively (where of course m + l = n).× We will× call× such a matrix× block triangular if C or B is the zero matrix 0. We claim that

A B det = det A det D. 0 D   Indeed, consider a permutation σ of 1, ..., n which sends at least one of the indices 1, ..., m to the{ other} part of the set, m+1, ..., m+l . Then{ σ must} send at least one of m+1, ..., m+l back{ to 1, ..., m} . This means that every elementary{ product in our} n n-determinant{ } which contains a factor from B must also contain a× factor from C, and hence vanish, if C = 0. Thus only the permu- tations σ which permute 1, ..., m separately from m + 1, ..., m + l contribute to the determinant{ in} question. Elementary{ products} corresponding to such permutations factor into elementary prod- ucts from det A and det D and eventually add up to the product det A det D. Of course, the same holds true if B = 0 instead of C = 0. We will use the factorization formula in the 1st proof of the fol- lowing fundamental property of determinants.

Multiplicativity Theorem. The determinant is multiplicative with respect to matrix products: for arbitrary n n-matrices A and B, × det(AB) = (det A)(det B).

We give two proofs: one ad hoc, the other more conceptual. A 0 Proof I. Consider the auxiliary 2n 2n matrix with × I B − the determinant equal to the product (det A)(det B) according to the factorization formula. We begin to change the matrix by adding to the last n columns linear combinations of the first n columns with such coefficients that the submatrix B is eventually replaced by zero 2. Determinants 47 submatrix. Thus, in order to kill the entry bkj we must add the bkj-multiple of the k-th column to the n + j-th column. Accord- ing to the properties of determinants (see (iv)) these operations do not change the determinant but transform the matrix to the form A C . We ask the reader to check that the entry c of the I 0 ij  −  submatrix C in the upper right corner equals ai1b1j + ... + ainbnj so that C = AB is the matrix product! Now, interchanging the i-th and n + i-th columns, i = 1, ..., n, we change the determinant by the C A factor of ( 1)n and transform the matrix to the form . − 0 I − The factorization formula applies again and yields det C det( I). We conclude that det C = det A det B since det( I) = ( 1)n −compen- sates for the previous factor ( 1)n. − − − Proof II. We will first show that the properties (i – v) com- pletely characterize det[v1,..., vn] as a function of n columns vi of size n. Indeed, consider a function f, which to n columns v1,..., vn, associates a number f(v1,..., vn). Suppose that f is linear with respect to each column. Let ei denotes the ith column of the identity n matrix. Since v1 = i=1 vi1ei, we have: P n f(v1, v2,..., vn)= vi1f(ei, v2,..., vn). Xi=1 n Using linearity with respect to the 2nd column v2 = j=1 vj2ej, we similarly obtain: P n n f(v1, v2,... vn)= vi1vj2f(ei, ej, v3,..., vn). Xi=1 Xj=1 Proceeding the same way with all columns, we get:

f(v ,..., v )= v v f(e ,..., e ). 1 n i11 · · · inn i1 in i1X,...,in

Thus, f is determined by its values f(ei1 ,..., ein ) on strings of n basis vectors. Let us assume now that f is totally anti-symmetric. Then, if any two of the indices i1,...,in coincide, we have: f(ei1 ,..., ein ) = 0. 48 Chapter 2. DRAMATIS PERSONAE All other coefficients correspond to permutations σ = 1 ... n i1 ... in of the indices (1,...,n), and hence satisfy:  

f(ei1 ,..., ein )= ε(σ)f(e1,..., en). Therefore, we find:

f(v1,..., vn)= vσ(1)1 ...vσ(n)nε(σ)f(e1,..., en), σ X = f(e1,..., en) det[v1,..., vn]. Thus, we have established: Proposition 1. Every totally anti-symmetric function of n coordinate vectors of size n which is linear in each of them is proportional to the determinant function. Next, given an n n matrix C, put × f(v1,..., vn) := det[Cv1,...,Cvn].

Obviously, the function f is a totally anti-symmetric in all vi (since det is). Multiplication by C is linear: C(λu + µv)= λCu + µCv for all u, v and λ, µ.

. Therefore, f is linear with respect to each vi (as composition of two linear operations). By the previous result, f is proportional to det. Since Cei are columns of C, we conclude that the coefficient of proportionality f(e1,..., en) = det C. Thus, we have found the following interpretation of det C. Proposition 2. det C is the factor by which the determi- nant function of n vectors vi is multiplied when the vectors are replaced with Cvi. Now our theorem follows from the fact that when C = AB, the substitution v Cv is the composition v Av ABv of consec- utive substitutions7→ defined by A and B. Under7→ the7→ action of A, the function det is multiplied by the factor det A, then under the action of B by another factor det B. But the resulting factor (det A)(det B) must be equal to det C.  Corollary. If A is invertible, then det A is invertible. Indeed, (det A)(det A−1) = det I = 1, and hence det A−1 is recip- rocal to det A. The converse statement: that matrices with invertible determinants are invertible, is also true due to the explicit formula for the inverse matrix, described in the next section. 2. Determinants 49 Remark. Of course, a real or complex number det A is invertible whenever det A = 0. Yet over the integers Z this is not the case: the only invertible6 integers are 1. The above formulation, and several similar formulations that± follow, which refer to invertibility of determinants, are preferable as they are more general.

The Cofactor Theorem In the determinant formula for an n n-matrix A each elementary product a ... begins with one of× the entries a , ..., a of the ± 1σ(1) 11 1n first row. The sum of all terms containing a11 in the 1-st place is the product of a11 with the determinant of the (n 1) (n 1)- matrix obtained from A by crossing out the 1-st row− and× the− 1-st column. Similarly, the sum of all terms containing a12 in the 1-st place looks like the product of a12 with the determinant obtained by crossing out the 1-st row and the 2-nd column of A. In fact it differs by the factor of 1 from this product, since switching the columns 1 and 2 changes− signs of all terms in the determinant formula and interchanges the roles of a11 and a12. Proceeding in this way with a13, ..., a1n we arrive at the cofactor expansion formula for det A which can be stated as follows.

j 123 4 5 a a i 11 1n 1 a ij 2 3 4 a a n1 nn 5

Figure 22 Figure 23

The determinant of the (n 1) (n 1)-matrix obtained from A by crossing out the row i and column− × j −is called the (ij)- of A (Figure 22). Denote it by Mij. The (ij)-cofactor Aij of the matrix A is the number that differs from the minor M by a factor 1: ij ± A = ( 1)i+jM . ij − ij The chess-board of the signs ( 1)i+j is shown on Figure 23. With these notations, the cofactor expansion− formula reads:

det A = a11A11 + a12A12 + ... + a1nA1n. 50 Chapter 2. DRAMATIS PERSONAE Example. a a a 11 12 13 a a a a a a a a a = a 22 23 a 21 23 + a 21 22 . 21 22 23 11 a a 12 a a 13 a a a a a 32 33 − 31 33 31 32 31 32 33

Using the properties (i) and (ii) of determinants we can adjust the cofactor expansion to the i-th row or j-th column: det A = ai1Ai1 + ... + ainAin = a1jA1j + ... + anjAnj, i, j = 1, ..., n. These formulas reduce evaluation of n n-determinants to that of (n 1) (n 1)-determinants and can× be useful in recursive com- putations.− × − Furthermore, we claim that applying the cofactor formula to the entries of the i-th row but picking the cofactors of another row we get the zero sum: a A + ... + a A =0 if i = j. i1 j1 in jn 6 Indeed, construct a new matrix A˜ replacing the j-th row by a copy of the i-th row. This forgery does not change the cofactors Aj1, ..., Ajn (since the j-th row is crossed out anyway) and yields the cofactor expansion ai1Aj1 + ... + ainAjn for det A˜. But A˜ has two identical rows and hence det A˜ = 0. The same arguments applied to the columns yield the dual statement: a A + ... + a A =0 if i = j. 1i 1j ni nj 6 All the above formulas can be summarized in a single matrix identity. Introduce the n n-matrix adj(A), called adjoint to A, by placing × the cofactor Aij on the intersection of j-th row and i-th column. In other words, each aij is replaced with the corresponding cofactor Aij, and then the resulting matrix is transposed:

a11 ... a1n A11 ... An1 adj ... aij . . . = ... Aji . . . .     an1 ... ann A1n ... Ann     Theorem. A adj(A) = (det A) I = adj(A) A. Corollary. If det A is invertible then A is invertible, and 1 A−1 = adj(A). det A −1 a b d b Example. If ad bc = 0, then = 1 − . − 6 c d ad−bc c a    −  2. Determinants 51 Cramer’s Rule This is an application of the Cofactor Theorem to systems of linear equations. Consider a system

a x + + a x = b 11 1 · · · 1n n 1 · · · a x + + a x = b n1 1 · · · nn n n of n linear equations with n unknowns (x1,...,xn). It can be written in the matrix form Ax = b, t where A is the n n-matrix of the coefficients aij, b = [b1,...,bn] is the column of the× right hand sides, and x is the column of unknowns. In the following Corollary, ai denote columns of A. Corollary. If det A is invertible then the system of lin- ear equations Ax = b has a unique solution given by the formulas:

det[b, a2, ..., an] det[a1, ..., an−1, b] x1 = , ... ,xn = . det[a1, ..., an] det[a1, ..., an]

Indeed, when det A = 0, the matrix A is invertible. Multi- plying the matrix equation6 Ax = b by A−1 on the left, we find: −1 −1 x = A b. Thus the solution is unique, and xi = (det A) (A1ib1 + ... + Anibn) according to the cofactor formula for the inverse ma- trix. But the sum b1A1i + ... + bnAni is the cofactor expansion for det[a1, ..., ai−1, b, ai+1, ..., an] with respect to the i-th column. Example. Suppose that a a = a a . Then the system 11 22 6 12 21

a11x1 + a12x2 = b1 a21x2 + a22x2 = b2 has a unique solution

b1 a12 a11 b1 b2 a22 a21 b2 x = , x = . 1 2 a11 a12 a11 a12 a21 a22 a21 a22

52 Chapter 2. DRAMATIS PERSONAE Three Cool Formulas We collect here some useful generalizations of previous results. A. We don’t know of any reasonable generalization of determi- nants to the situation when matrix entries do not commute. However a b the following generalization of the formula det = ad bc is c d −   instrumental in some non-commutative applications.1 A B In the , assume that D−1 exists. C D   A B Then det = det(A BD−1C) det D. C D −   A B I 0 A BD−1C B Proof: = . C D D−1C I − 0 D   −    B. Lagrange’s formula2 below generalizes cofactor expansions. By a multi-index I of length I = k we mean an increasing | | sequence i1 < < ik of k indices from the set 1,...,n . Given and n n-matrix· · · A and two multi-indices I, J of{ the same} length k, we define× the (IJ)-minor of A as the determinant of the k k- × matrix formed by the entries aiαjβ of A located at the intersections of the rows i1,...,ik with columns j1,...,jk (see Figure 24). Also, denote by I¯ the multi-index complementary to I, i.e. formed by those n k indices from 1,...,n which are not contained in I. − { } For each multi-index I = (i1,...,ik), the following cofac- tor expansion with respect to rows i1,...,ik holds true:

i1+···+i +j1+···+j det A = ( 1) k k M M¯ ¯, − IJ IJ J:X|J|=k where the sum is taken over all multi-indices J = (j1,...,jk) of length k. Similarly, one can similarly write Lagrange’s cofactor expansion formula with respect to given k columns.

Example. Let a1, a2, a3, a4 and b1, b2, b3, b4 be 8 vectors on the a1 a2 a3 a4 plane. Then = a1 a2 b3 b4 a1 a3 b2 b4 b1 b2 b3 b4 | || | − | || |

+ a a b b a a b b + a a b b a a b b . | 1 4|| 2 3| − | 2 3|| 1 4| | 2 4|| 1 3| − | 3 4|| 1 2| 1 Notably in the definition of Berezinian in super-mathematics [7]. 2After Joseph-Louis Lagrange (1736–1813). 2. Determinants 53 In the proof of Lagrange’s formula, it suffices to assume that it is written with respect to the first k rows, i.e. that I = (1,...,k). In- deed, interchanging them with the rows i < < i takes 1 · · · k (i1 1) + (i2 2) + + (ik k) transpositions, which is accounted for− by the sign− ( 1)i·1 ·+ ····+ik in− the formula. − Next, multiplying out MIJ MI¯J¯, we find k!(n k)! elementary products of the form: −

a1,jα1 ak,jα ak+1,¯j an,¯j , ± · · · k β1 · · · βn−k

− where α = 1 ... k and β = 1 ... n k are permu- α1 ... αk β1 ... βn−k ¯ ¯ tations, andjαµ J, jβν  J. It is clear that the total sum over multi-indices I contains∈ each∈ elementary product from det A, and does it exactly once. Thus, to finish the proof, we need to compare the signs.

j j j 1 2 k

i 1 i 2

i k

Figure 24

The sign in the above formula is equal to ε(α)ε(β), the prod- uct of the signs± of the permutations α and β. The sign of this elementary product in the definition of det A is equal to the sign 1 ... k k + 1 ... n of the permutation ¯ ¯ on the set „ jα1 ... jαk jβ1 . . . jβn−k « J J¯ = 1,...,n . Reordering separately the first k and last n k indices∪ in{ the increasing} order changes the sign of the permutation− by ε(α)ε(β). Therefore the signs of all summands of det A which occur in MIJ MI¯J¯ are coherent. It remains to find the total sign with which MIJ MI¯J¯ occurs in det A, by computing the sign of the permu- 1 ... k k + 1 ... n tation σ := ¯ ¯ , where j1 < jk and j1 ... jk j1 . . . jn−k · · · ¯j < < ¯j  .  1 · · · n−k Starting with the identity permutation (1, 2 ...,j1,...,j2,...,n), it takes j1 1 transpositions of nearby indices to move j1 to the 1st place. Then− it takes j 2 such transpositions to move j to the 2nd 2 − 2 54 Chapter 2. DRAMATIS PERSONAE place. Continuing this way, we find that

ε(σ) = ( 1)(j1−1)+···+(jk−k) = ( 1)1+···+k+j1+···+jk . − − This agrees with Lagrange’s formula, since I = 1,...,k . . { } C. Let A and B be k n and n k matrices (think of k

det AB = (det AI )(det BI). XI Note that when k = n, this turns into the multiplicative property of determinants: det(AB) = (det A)(det B). Our second proof of it can be generalized to establish the formula of Binet–Cauchy. Namely, let a1,..., an denote columns of A. Then the jth column of C = AB is the linear combination: c = a b + + a b . Using linearity j 1 1j · · · n nj in each cj, we find:

det[c ,..., c ]= det[a ,..., a ]b b . 1 k i1 ik i11 · · · ikk 1≤i1X,...,ik≤k

If any two of the indices iα coincide, det[ai1 ,..., aik ] = 0. Thus the sum is effectively taken over all permutations 1 ... k on the „ i1 ... ik « 4 set i1,...,ik . Reordering the columns ai1 ,..., aik in the increas- ing order{ of the} indices (an paying the “fees” 1 according to parities of permutations) we obtain the sum over all± multi-indices of length k: ′ ′ det[ai ,..., ai ] ε(σ)bi11 bi k. 1 k · · · k i′ <···n, det AB = 0.

3After Jacques Binet (1786–1856) and Augustin Louis Cauchy (1789–1857). 4Remember that in a set, elements are unordered! 2. Determinants 55 This is because no multi-indices of length k>n can be formed from 1,...,n . In the oppositely extreme case k = 1, Binet– { } t Cauchy’s formula turns into the expression u v = uivi for the dot product of coordinate vectors. A “Pythagorean” interpretation of the following identity will come to light in the nextP chapter, in connection with volumes of parallelepipeds.

t 2 Corollary 2. det AA = I (det AI ) . P EXERCISES 110. Prove that the following determinant is equal to 0:

0 0 0 a b 0 0 0 c d  0 0 0 e f . pq rst

vwxyz

111. Compute determinants:

cos x sin x cosh x sinh x cos x sin y , , .  sin x −cos x sinh x cosh x sin x cos y

112. Compute determinants:

0 1 1 0 1 1 1 i 1+ i 1 0 1 , 1 2 3 , i 1 0 .  1 1 0 1 3 6 1− i 0 1

113. List all the 24 permutations of 1, 2, 3, 4 , find the length and the sign { } of each of them.  114. Find the length of the following permutation:

1 2 ... k k + 1 k + 2 . . . 2k .  „ 1 3 . . . 2k − 1 2 4 . . . 2k « 115. Find the maximal possible length of permutations of 1, ..., n .  { } 1 ... n 116. Find the length of a permutation given the length l „ i1 ... in « of the permutation 1 ... n .  „ in ... i1 « 117. Prove that inverse permutations have the same length.  118. Compare parities of permutations of the letters a,g,h,i,l,m,o,r,t in the words logarithm and .  56 Chapter 2. DRAMATIS PERSONAE 12345 119. Represent the permutation as composition of a 45132   minimal number of transpositions. 

120. Do products a13a24a53a41a35 and a21a13a34a55a42 occur in the defin- ing formula for determinants of size 5? 

121. Find the signs of the elementary products a23a31a42a56a14a65 and a32a43a14a51a66a25 in the definition of determinants of size 6 by computing the numbers of inverted pairs of indices.  122. Compute the determinants

246 427 327 13247 13347 , 1014 543 443 .  28469 28569 342 721 621

123. The numbers 195, 247, and 403 are divisible by 13. Prove that the 1 9 5 following determinant is also divisible by 13: 2 4 7 .  4 0 3

124. Professor Dumbel writes his office and home phone numbers as a 7 1- × matrix O and 1 7-matrix H respectively. Help him compute det(OH).  × 125. How does a determinant change if all its n columns are rewritten in the opposite order?  1 x x2 ... xn 2 n 1 a1 a1 ... a1 ⋆ 2 n 126. Solve the equation 1 a2 a2 ... a2 = 0, where all a1, ..., an

... 1 a a2 ... an n n n are given distinct numbers. 

127. Prove that an anti-symmetric matrix of size n has zero determinant if n is odd.  128. How do similarity transformations of a given matrix affect its deter- minant?  129. Prove that the adjoint matrix of an upper (lower) triangular matrix is upper (lower) triangular. 130. Which triangular matrices are invertible? 131. Compute the determinants: ( is a wild card): ∗ a a b ∗ ∗ ∗ n ∗ ∗ . . . 0 c d  (a) ∗ ∗ , (b) ∗ ∗ . a2 0 . . . e f 0 0 a∗ 0 . . . 0 g h 0 0 1

2. Determinants 57 132. Compute determinants using cofactor expansions:

1 2 2 1 2 1 0 0 − 0 1 0 2 1 2 1 0  (a) , (b) − − . 2 0 1 1 0 1 2 1 0 2 0 1 0− 0 1− 2

133. Compute inverses of matrices using the Cofactor Theorem:

1 2 3 1 1 1 (a) 3 1 2 , (b) 0 1 1 .  " 2 3 1 # " 0 0 1 #

134. Solve the systems of linear equations Ax = b where A is one of the matrices of the previous exercise, and b = [1, 0, 1]t.  135. Compute 1 1 1 0 0 − 0− 1 1 0 .  0 0− 1 1  0 0 0− 1     136. Express det(adj(A)) of the adjoint matrix via det A.  137. Which integer matrices have integer inverses?  138. Solve systems of equations using Cramer’s rule:

2x x x = 4 x +2x +4x = 31 1 − 2 − 3 1 2 3 (a) 3x1 +4x2 2x3 = 11 , (b) 5x1 + x2 +2x3 = 29 .  3x 2x +4− x = 11 3x x + x = 10 1 − 2 3 1 − 2 3 139.⋆ Compute determinants:

a 0 0 0 0 b 0 x x ... x 1 2 n 0 a 0 0 b 0 x 1 0 . . . 0 1 0 0 a b 0 0 (a) x2 0 1 . . . 0 , (b)  . 0 0 c d 0 0 .. . . . 0 c 0 0 d 0 xn 0 . . . 0 1 c 0 0 0 0 d

⋆ 140. Let Pij , 1 i < j 4, denote the 2 2-minor of a 2 4-matrix formed by the columns≤ i and≤ j. Prove the following× Pl¨ucker identity× 5

P P P P + P P =0.  12 34 − 13 24 14 23

5After Julius Pl¨ucker (1801–1868). 58 Chapter 2. DRAMATIS PERSONAE 141. The of two vectors x, y R3 is defined by ∈ x x x x x x x y := 2 3 , 3 1 , 1 2 . × y2 y3 y3 y1 y1 y2  

Prove that the length x y = x 2 y 2 x , y 2.  | × | | | | | − h i ⋆ 1 ∆n 142. Prove that an + p = , 1 ∆n 1 an 1 + − − 1 + · · · 1 a1 + a0 a0 1 0 . . . 0 1 a1 1 . . . 0 −  where ∆n = ...... 0 . . . 1 a 1 n 1 0 . . . −0 −1 a − n

λ 1 0 . . . 0 0 −λ 1 . . . 0 ⋆ −  143. Compute: ...... 0 . . . 0 λ 1 − an an 1 ... a2 λ + a1 −

11 1 . . . 1 2 3 n 1 1 1 . . . 1 3 4 n+1 144.⋆ Compute: 1 . . . .   2 2 2 .. . . . nn +1 2n 2 1 n 1 n 1 . . . n −1 − − − ⋆ 6 145. Prove Vandermonde’s  identity 

2 n 1 1 x1 x1 ... x1 − 2 n 1 1 x2 x2 ... x2 −  = (xj xi)...... − 2 n 1 1 i

12 3 ... n 3 3 3 ⋆ 1 2 3 ... n   146. Compute: ...... 1 22n 1 32n 1 ... n2n 1 − − −

6After Alexandre-The´ophile Vandermonde (1735–1796). 3. Vector Spaces 59 3 Vector Spaces

In four words, the subject of Linear Algebra can be described as geometry of vector spaces.

Axioms By definition, a vector space is a set, equipped with operations of addition and multiplication by scalars which are required to satisfy certain axioms: The set will be denoted here by , and its elements referred to as vectors. Here are the axioms. V (i) The sum of two vectors u and v is a vector (denoted u + v); the result of multiplication of a vector v by a scalar λ is a vector (denoted λv). (ii) Addition of vectors is commutative and associative:

u + v = v + u, (u + v)+ w = u + (v + w) for all u, v, w . ∈V (iii) There exists the zero vector (denoted by 0) such that

v + 0 = 0 + v = v for every v . ∈V (iv) For every vector u there exists the opposite vector, denoted by u, such that − u + u = 0. − (v) Multiplication by scalars is distributive: For all vectors u, v and scalars λ, µ we have

(λ + µ)(u + v)= λu + λv + µu + µv.

(vi) Multiplication by scalars is associative in the following sense: For every vector u and all scalars λ, µ we have:

(λµ)u = λ(µu).

(vii) Multiplication by scalars 0 and 1 acts on every vector u as

0u = 0, 1u = u. 60 Chapter 2. DRAMATIS PERSONAE We have to add to this definition the following comment about scalars. Taking one of the sets R or C of real or complex numbers on the role of scalars one obtains the definition of real vector spaces or complex vector spaces. In fact these two choices will suffice for all our major goals. The reader may assume that we use the symbol K to cover both cases K = R and K = C in one go. On the other hand, any field K would qualify on the role of scalars, and this way one arrives at the notion of K-vector spaces. By a field one means a set K equipped with two operations: addition and multiplication. Both are assumed to be commutative and associative, and satisfying the distributive law: a(b + c)= ab + ac. Besides, it is required that there exist elements 0 and 1 such that a +0= a and 1a = a for all a K. Then, it is required that every a K has the opposite a such∈ that a + a = 0, and every non-zero a∈ K has its inverse a−1 such that a−−1a = 1. To the examples of fields∈ C and R, we can add (omitting many other available examples): the field Q of rational numbers; the field C of all algebraic numbers (i.e. roots of polynomials in oneA variable ⊂ with rational coefficients); the field Zp of integers modulo a given prime number p. For instance, the set Z2 = 0, 1 of remainders modulo 2 with the usual of remainders(0+0{ } = 0 = 1+1, 0+1 = 1 = 1+0, 0 0 = 1 0 = 0 1 = 0, 1 1 = 1) can be taken on the role of scalars. This· gives· rise· to the · definition of Z2-vector spaces useful in computer science and logic. To reiterate: it is essential that division by all non-zero scalars is defined. Therefore the set Z of all integers and the set F[x] of all polynomials in one indeterminate x with coefficients in a field F are not fields, and do not qualify on the role of scalars in the definition of vector spaces, because the division is not always possible. However the field Q of all rational numbers and the field F(x) of all rational functions with coefficients in a field F are O.K.

Examples

The above definition of vector spaces is doubly abstract: not only it neglects to specify the set of vectors, but it does not even tell us anything explicit about theV nature of the operations of addition of vectors and multiplication of vectors by scalars. To find various ex- amples of vector spaces we should figure out which operations would be good candidates to satisfy the axioms (i–vii). It turns out that in the majority of useful examples, the operations are pointwise addition of functions and multiplication of functions by scalars. 3. Vector Spaces 61 Example 1. Let S be any set, and be the set of all functions on S with values in K. We will denote thisV set by KS. The sum and multiplication by scalars are defined on KS as pointwise operations with functions. Namely, given two functions f, g and a scalar λ, the values of the sum f + g and the product λf at a point s S are ∈ (f + g)(s)= f(s)+ g(s), (λf)(s)= λ(f(s)). It is immediate to check that = KS equipped with these operations satisfies the axioms (i–vii). ThusV KS is a K-vector space. Example 1a. Let S be the set of n elements 1, 2, ..., n. Then the space KS is the coordinate space Kn (e.g. RS = Rn and CS = Cn). Namely, each function on the set 1,...,n is specified by the column t { } x = (x1, ..., xn) of its values, and the usual operations with such columns coincide with pointwise operations with the functions. Example 1b. Let S be the set of all ordered pairs (i, j), where i = 1, ..., m, j = 1, ..., n. Then the vector space KS is the space of m n-matrices A whose entries aij lie in K. The operations of addition× of matrices and their multiplication by scalars coincide with pointwise operations with the functions. Example 2. Let be a K-vector space. Consider the set S of all functions on a givenV set S with values in . Elements of V can be added and multiplied by scalars. RespectivelyV the vector-valuedV functions can be added and multiplied by scalars in the pointwise fashion. Thus, S is an example of a K-vector space. V Example 3. A non-empty subset in a vector space is called a (or simply subspaceW ) if linear combinationsV λu + µv of vectors from with arbitrary coefficients lie in . In particular, for every u W , u = ( 1)u , and 0 = 0uW . A subspace of a vector∈ space W − satisfies− the axioms∈ W of a vector∈ space W on its own (since the operations are the same as in ). Thus every subspace of a K-vector space is an example of a K-vectorV space. Example 3a. For instance, all upper triangular n n-matrices (lower triangular, block triangular, block diagonal, diagonal× matrices — the reader can continue this line of examples) form subspaces in the space of all n n-matrices, and therefore provide examples of vector spaces. × Example 3b. The set of all polynomials (say, in one variable),7 form a subspace in the space RR of all real-valued functions on the 7As well as sets of all continuous, differentiable, 5 times continuously differen- tiable, infinitely differentiable, Riemann-integrable, measurable, etc. functions, introduced in Mathematical Analysis. 62 Chapter 2. DRAMATIS PERSONAE number line and therefore provide examples of real vector spaces. More generally, polynomials with coefficients in K (as well as such polynomials of degree not exceeding 7) form examples of K-vector spaces. Example 3c. Linear forms or quadratic forms in Kn form sub- n spaces in the space KK of all K-valued functions on Kn, and thus provide examples of K-vector spaces. Example 3d. All bilinear forms of two vectors v Km, w Kn form a subspace in the space of all functions on the∈ Cartesian∈ product8 Km Kn with values in K. Hence they form a K-vector space. ×

Morphisms The modern ideology requires objects of mathematical study to be organized into categories. This means that in addition to specifying objects of interest, one should also specify morphisms, i.e. maps between them. The category of vector spaces is obtained by taking vector spaces for objects, and linear maps for morphisms. By definition, a function A : from a vector space to a vector space is called a linearV→W map if it respects the operationsV with vectors,W i.e. if it maps linear combinations of vectors to linear combinations of their images with the same coefficients:

A(λu + µv)= λAu + µAv for all u, v and λ, µ K. ∈V ∈ With a linear map A : , one associates two subspaces, one in , called the null spaceV→W, or the of A and denoted Ker A, andV the other in , called the range of A and denoted A( ): W V Ker A := v : Av = 0 , { ∈V } A( ) := w : w = Av for some v . V { ∈W ∈ V} A linear map is injective, i.e. maps different vectors to different ones, exactly when its kernel is trivial. Indeed, if Ker A = 0 , then it contains non-zero vectors mapped to the same point 06 in{ } as 0 from . This makes the map A non-injective. Vice versa,W if A is non-injective,V i.e. if Av = Av′ for some v = v′, then u = v v′ = 0 lies in Ker A. This makes the kernel nontrivial.6 − 6

8Cartesian product of two sets A and B is defined as the set of all ordered pairs (a,b) of elements a ∈ A and b ∈ B. 3. Vector Spaces 63 When the range of a map is the whole target space, A( ) = , the map is called surjective. If a linear map A : V is Wbi- jective, i.e. both injective and surjective, it establishesV →Wa one-to-one correspondence between and in a way that respects vector oper- ations. Then one says thatV A establishesW an isomorphism between the vector spaces and . Two vector spaces are called isomor- phic (written =V ) if thereW exists an isomorphism between them. V ∼ W Example 4. Let = be the space K[x] of all polynomials in one indeterminate x VwithW coefficients from K. The differentiation d/dx : K[x] K[x] is a linear map defined by → d (a + a x + + a xn)= a + 2a x + + na xn−1. dx 0 1 · · · n 1 1 · · · n

It is surjective, with the kernel consisting of constant polynomials.9 Example 5. Linear combinations λA + µB of linear maps A, B : are linear. Therefore all linear maps from to form a subspaceV →W in the space of all vector-valued functionsV W . The vector space of linear maps from to is usuallyV denoted →W by10 Hom( , ). V W V W Example 5a. The space of linear functions (or linear forms) on , i.e. linear maps K, is called the space dual to , and is denotedV by ∗. V → V V The following formal construction indicates that every vector space can be identified with a subspace in a space of functions with pointwise operations of addition and multiplication by scalars. Example 5b. Given a vector v and a linear function f ∗, the value f(v) K is defined. We∈V can consider it not as a function∈V f of v, but as∈ a function of f defined by v. This way, to a vector ∗ v we associate the function Ev : K defined by evaluating all V → linear functions K on the vector v. The function Ev is linear, V → since (λf + µg)(v) = λf(v)+ µg(v). The linear function Ev is an element of the second ( ∗)∗. The formula f(λv + µw)= λf(v)+ µf(w), expressing linearityV of linear functions, shows that Ev depends linearly on v. Thus the evaluation map E : v Ev is a linear map ( ∗)∗. One can show11 that E is injective7→ V → V 9 When K contains Q (e.g. K = R or C), but not when K contains Zp. 10From homomorphism, a word roughly synonymous to the terms linear map and morphism. 11Although we are not going to do this, as it would drag us into the boredom of general set theory and transfinite induction. 64 Chapter 2. DRAMATIS PERSONAE and thus provides an isomorphism between and its range E( ) ( ∗)∗. V V ⊂ V The previous result and examples suggest that vector spaces need not be described abstractly and raises the suspicion that the ax- iomatic definition is misleading as it obscures the actual nature of vectors as functions subject to the pointwise algebraic operations. Here are however some examples where vectors do not come natu- rally as functions. Example 6a. Geometric vectors (see Section 1 of Chapter 1), and forces and velocities in physics, are not introduced as functions. Example 6b. Rational functions are defined as ratios P/Q of polynomials P and Q. Morally they are functions, but technically they are not. More precisely, the domain of P/Q is the non-empty set of points x where Q(x) = 0, but it varies with the function. All rational functions do form a6 vector space, but there is not a single point x at which all of them defined. The addition operation is not defined as pointwise, but is introduced instead by means of the formula: P/Q + P ′/Q′ = (PQ′ + QP ′)/QQ′.

w (v,w) v v’ v W 0

Figure 25 Figure 26

Example 6c. Vector spaces frequently occur in number theory and field theory in connection with field extensions, i.e. inclusions K F of one field as a subfield into another. As examples, we already⊂ have: Q C, Q R C, and K K(x). Given a field extension K ⊂A⊂F, it is often⊂ useful⊂ to temporarily⊂ forget that elements of F can⊂ be multiplied, but still remember that they can be multiplied by elements of K. This way, F becomes a K-vector space (e.g. C is a real plane). Vectors here, i.e. elements of F, are “numbers” rather then “functions.” More importantly, various abstract constructions of new vector spaces from given ones are used regularly, and it would be very awk- ward to express the resulting vector space as a space of functions even when the given spaces are expressed this way. Here are two such constructions. 3. Vector Spaces 65 Direct Sums and Quotients Example 7. Given two vector spaces and , their direct sum (Figure 25) is defined as the setV of allW ordered pairs (v, w), whereV⊕Wv , w , equipped with the component-wise operations: ∈V ∈W λ(v, w) = (λv, λw), (v, w) + (v′, w′) = (v + v′, w + w′). Of course, one can similarly define the direct sum of several vector spaces. E.g. Kn = K K (n times). ⊕···⊕ Example 7a. Given a linear map A : , its graph is defined as a subspace in the direct sum V: →W V⊕W Graph A = (v, w) : w = Av . { ∈V⊕W } Example 8. The quotient space of a vector space by a sub- space is defined as follows. Two vectors v and v′ (FigureV 26) are called Wequivalent modulo , if v v′ . This way, all vectors from become partitionedW into equivalence− ∈ W classes. These equiva- lenceV classes form the quotient vector space / . V W More precisely, denote by π : / the canonical projec- tion, which assigns to a vector vV→Vits equivalenceW class modulo . This class can be symbolically written as v + , a notation empha-W sizing that the class consists of all vectors obtainedW from v by adding arbitrary vectors from . Alternatively, one may think of v + as a “plane” obtained fromW as translation by the vector v. WhenW v , we have v + = W. When v / , v + is a not a linear subspace∈W in . We willW callW it an affine∈W subspaceWparallel to . V W The set / of all affine subspaces in parallel to is equipped with algebraicV W operations of addition andV multiplicationW by scalars in such a way that the canonical projection π : / becomes a linear map. In fact this condition leaves no choices,V→V sinceW it requires that for every u, v and λ, µ K, ∈V ∈ λπ(u)+ µπ(v)= π(λu + µv). In other words, the linear combination of given equivalence classes must coincide with the equivalence class containing the linear com- bination λu + µv of arbitrary representatives u, v of these classes. It is important here that picking different representatives u′ and v′ will result in a new linear combination λu′ + µv′ which is however equiv- alent to the previous one. Indeed, the difference λ(u u′)+µ(v v′) lies in since u u′ and v v′ do. Thus linear− combinations− in / areW well-defined.− − V W 66 Chapter 2. DRAMATIS PERSONAE Example 8a. Projecting 3D images to a 2-dimensional screen is described in geometry by the canonical projection π from the 3D space to the plane / of the screen along the line of the eye sight (FigureV 27). V W W

V V/W

0 W 0

Figure 27 Example 8b. The direct sum contains and as sub- spaces consisting of the pairs (v, 0)V⊕W and (0, w) respectively.V W The quo- tient of by is canonically identified with , because each pair (v, w) isV⊕W equivalentW modulo to (v, 0). Likewise,V ( ) / = . W V⊕W V W Example 8c. Let = R[x] be the space of polynomials with real coefficients, and theV subspace of polynomials divisible by x2 + 1. Then the quotientW space / can be identified with the plane C of complex numbers, and theV W projection π : R[x] C with the map P P (i) of evaluating a polynomial P at x = i. Indeed,→ polynomials P 7→and P ′ are equivalent modulo if and only if P P ′ is divisible by x2 + 1, in which case P (i) = WP ′(i). Vice versa,− if P (i) = P ′(i), then P ( i) = P ′( i) (since the polynomials are real), and hence P P ′ is− divisible− by (x i)(x + i)= x2 + 1. − − Example 8d. For every linear map A : ′, there is a canon- V→V ical isomorphism A˜ : / Ker A A( ) between the quotient by the kernel of A, and itsV range.→ Namely,V Au = Av if and only if u v Ker A, i.e. whenever u is equivalent to v modulo the ker- nel.− Thus,∈ one can think of every linear map as the projection of the source space onto the range along the null space. This is a manifes- tation of a general homomorphism theorem in algebra, which in the context of vector spaces can be formally stated this way: Theorem. Every linear map A : ′ is uniquely repre- sented as the composition A = iAπ˜ V→Vof the canonical projec- tion π : / Ker A with the isomorphism A˜ : / Ker A V →V V → 3. Vector Spaces 67 A( ) followed by the inclusion i : A( ) ′: V V ⊂V A ′ π V −→ V i ↓∼ ∪ . / Ker A = A( ) V −→A˜ V

Quaternions The assumption that scalars commute under multiplication is reason- able because they usually do, but it is not strictly necessary. Here is an important example: vector spaces over over a skew-filed, namely the skew-filed H of quaternions. By definition, H = C2 = R4 consists of quaternions

q = z + wj = (a + bi) + (c + di)j = a + bi + cj + dk, where z = a + bi and w = c + di are complex numbers, and k = ij. The multiplication is determined by the requirements: i2 = j2 = 1, ij = ji, associativity, and bilinearity over R. Namely: − − q′q = (z′ + w′j)(z + wj)= z′z + w′jwj + z′wj + w′jz = (z′z w′w¯) + (z′w + w′z¯)j, − where we use that j(x+yi) = (x yi)j for all real x,y. Equivalently, in purely real notation, 1, i, j, k form− the standard basis of H = R4, and the product is specified by the multiplication table:

i2 = j2 = k2 = 1, ij = k = ji, jk = i = kj, ki = j = ik. − − − − The quaternion

q∗ =z ¯ wj = a bi cj dk − − − − is called adjoint or conjugate to q. We have:

q∗q = (¯z wj)(z + wj) =zz ¯ + ww¯ + (¯zw wz¯)j − − = z 2 + w 2 = a2 + b2 + c2 + d2 = q 2. | | | | | | By q we denote here the norm or absolute value of quaternion | | q. The norm coincides with the Euclidean length √a2 + b2 + c2 + d2 68 Chapter 2. DRAMATIS PERSONAE of the vector q R4. Note that (q∗)∗ = q, and hence qq∗ = q∗ 2 = q 2 = q∗q. It follows,∈ that if q = 0, then | | | | 6 q∗ q−1 = = q −2(a bi cj dk) q 2 | | − − − | | is the quaternion inverse to q, i.e. q−1q =1= qq−1. We conclude that although H is not a field, since multiplication is not commuta- tive, it shares with fields the property that all non-zero elements of H are invertible. In the definition of an H-vector space, one should be cautious about only the axiom (vi): associativity of multiplication by scalars. Namely, there are two versions of this axiom:

(qq′)u = q(q′u) and v(qq′) = (vq)q′.

Using the first or the second one obtains the category of left or right quaternionic vector space. The actual difference is not in the side of the vector on which the scalars are written — this difference is purely typographical — but in the order of the factors: in the left case v is first multiplied by q′, and in the right by q. Some people prefer to deal with left, other with right quaternionic spaces, but in fact it is a matter of religion and hence does not matter. Namely, a left vector space can be converted into a right one (and vice versa) by defining the new multiplication by q as the old multiplication by q∗. The trick is that conjugation q q∗ is not exactly an automorphism of H because it changes the order7→ of ∗ ∗ ∗ factors: (q1q2) = q2q1. We prefer to work with right quaternionic vector spaces, and the following example explains why. Define the coordinate quaternionic space Hn as the set of all n-columns of quaternions, added component-wise, and multiplied by scalars q H on the right: ∈ ′ ′ q1 q1 q1 + q1 q1 q1q . . . . . + = , q = .  .   .   .   .   .  q q′ q + q′ q q q  n   n   n n   n   n            Clearly, Hn is a right quaternionic vector space. n Let A be an m n matrix with entries aij H, and x H . Then the usual matrix× product y = Ax yields a vector∈ y Hm∈: ∈ y = a x + + a x , i = 1,...,m. i i1 1 · · · in n 3. Vector Spaces 69 The x y = Ax defines an H-linear map A : Hn Hm. Namely, A is7→ not only additive, but it also com- mutes with multiplication→ by quaternionic scalars: (Ax)q = A(xq). The point is that the matrix entries aij are applied to the compo- nents xj of the vectors on the left, while the scalar q is applied on the right, and hence the order of these operations is irrelevant. For left vector spaces, linear maps y = Ax would correspond to matrix multiplication formulas y = x a + +x a that look a bit weird. i 1 i1 · · · n in EXERCISES 147. Give an example of a “vector space” that satisfies all axioms except the last one: 1u = u for all u.  148. Prove that the axiom: 0u = 0 for all u, in the definition of vector spaces is redundant, i.e. can be derived from remaining axioms.  149. Derive from axioms of vector spaces that ( 1)u = u for all u.  ⋆ − − 150. Prove that every non-zero element of Zp is invertible.  151. Verify that KS and S are vector spaces. V n n m 152. How many vectors are there in Zp-vector space Zp ? Hom(Zp , Zp )?  153. Show that in K[x], polynomials of degree n do not form a subspace, but polynomials of degree n do.  ≤ 154. Prove that intersection of subspaces is a subspace.

155. How many vectors are there in the Zp-vector space of strictly upper triangular n n-matrices?  × 156. Show that the map f b f(x) dx defined by integration of (say) 7→ a polynomial functions is a linear form R[x] R. R → d 157. Find the kernel and the range of the differentiation map D = dx : K[x] K[x], when K = Z .  → p n 158. For = K , verify that the map E : ∗∗ defined in Example 5b V V→V is an isomorphism.

159. Let be any subset of . Define ⊥ ∗ as the set of all W ⊂V V W ⊂ V those linear functions which vanish on . Prove that ⊥ is a subspace of W W ∗. (It is called the annihilator of .) V W 160. Let be a subspace. Establish a canonical isomorphism between W⊂V the dual space ( / )∗ and the annihilator ⊥ ∗.  V W W ⊂V 161. Let B be a bilinear form on , i.e. a function (v, w) B(v, w) V×V 7→ of pairs of vectors linear in each of them. Prove that the left kernel and right kernel of B, defined as LKerB = v B(v, x) = 0 for all x { | ∈ V} and RKerB = w B(x, w) = 0 for all x , are subspaces of , which { | ∈ V} V coincide when B is symmetric or anti-symmetric. 70 Chapter 2. DRAMATIS PERSONAE

162. Establish canonical isomorphisms between the spaces Hom( , ∗), V W Hom( , ∗), and the space ( , ) of all bilinear forms on .  W V B V W V × W 163. Describe all affine subspaces in R3.  164. Prove that the intersection of two affine subspaces, parallel to given linear ones, if non-empty, is an affine subspace parallel to the intersection of the given linear subspaces.  165. Prove that for a linear map A : , Ker A = (Graph A) . V → W ∩V 166. Show that ( )∗ = ∗ ∗.  V ⊕ W V ⊕ W 167. Composing a linear map A : with a linear form f ∗ t V → W ∈ W we obtain a linear form A f ∗. Prove that this defines a linear map t t ∈ V A : ∗ ∗. (The map A is called dual or adjoint to A.) Show that W → V (AB)t = BtAt. 168. Given the kernel and the range of a linear map A : , find the V → W kernel and the range of the dual map At.  169. Show that numbers of the form a + b√2, where a,b Q, form a ∈ subfield in R, and a Q-vector space. 2 1 170. For q = i cos θ + j sin θ, compute q , q− .  171. Prove multiplicativity of the norm of quaternions: q q = q q .  | 1 2| | 1| | 2| 172. Prove that in H, (q1q2)∗ = q2∗q1∗. 173. Prove that if f : Hn H is a quaternionic linear function, then → qf, where q H, is also a quaternionic linear function, but fq generally ∈ speaking not. 174. On the set Hom(Hn, Hm) of all H-linear maps A : Hn Hm, define → the structure of a right H-vector space.  175. Let x + yj C2 be complex coordinates on H. Show that multiplica- ∈ tion by q = z + wj H on the left is, generally speaking, not C-linear, but ∈ on the right is, and find its 2 2-matrix.  × 176.⋆ Find all quaternionic square roots of 1, i.e. solve the equation − q2 = 1 for q H.   − ∈ Chapter 3

Simple Problems

1 Dimension and Rank

Bases Let be a K-vector space. A subset V (finite or infinite) is calledV a basis of if every vector of can⊂ be V uniquely written as a (finite!) linear combinationV of vectorsV from V . Example 1. Monomials xk, k = 0, 1, 2,... , form a basis in the space K[x] since every polynomial is uniquely written as a linear combination of monomials. n t Example 2. In K , every vector (x1,...,xn) is uniquely written as the linear combination x1e1 + +xnen of unit coordinate vectors t · · · t e1 = (1, 0,..., 0) , ..., en = (0,..., 0, 1) . Thus, vectors e1,..., en form a basis. It is called the standard basis of Kn. The notion of basis has two aspects which can be considered separately. Let V be any set of vectors. Linear combinations λ v + + ⊂V 1 1 · · · λkvk, where the vectors vi are taken from the subset V , and λi are arbitrary scalars, form a subspace in . (Indeed, sums and scalar multiples of linear combinations of vectorsV from V are also linear combinations of vectors from V .) This subspace is often denoted as Span V . One says that the set V spans the subspace Span V , or that Span V is spanned by V . A set V of vectors is called linearly independent, if no vector from Span V can be represented as a linear combination of vectors from V in more than one way. To familiarize ourselves with this notion, let us give several reformulations of the definition. Here is

71 72 Chapter 3. SIMPLE PROBLEMS one: no two distinct linear combinations of vectors from V are equal to each other. Yet another one: if two linear combinations of vectors from V are equal: α v + + α v = β v + + β v , then their 1 1 · · · k k 1 1 · k k coefficients must be the same: α1 = β1,...,αk = βk. Subtracting one linear combination from the other, we arrive at one more refor- mulation: if γ v + + γ v = 0 for some vectors v V , then 1 1 · · · k k i ∈ necessarily γ1 = = γk = 0. In other words, V is linearly inde- pendent, if the vector· · · 0 can be represented as a linear combination of vectors from V only in the trivial fashion: 0 = 0v1 + + 0vk. Equivalently, every nontrivial linear combination of vectors· · · from V is not equal to zero: γ1v1 + + γkvk = 0 whenever at least one of γ = 0. · · · 6 i 6 Of course, a set V is called linearly dependent if it is not linearly independent. Yet,⊂V it is useful to have an affirmative reformu- lation: V is linearly dependent if and only if some nontrivial linear combination of vectors from V vanishes, i.e. γ v + + γ v = 0, 1 1 · · · k k where at least one of the coefficients (say, γk) is non-zero. Dividing by this coefficients and moving all other terms to the other side of the equality, we obtain one more reformulation: a set V is linearly dependent if one of its vectors can be represented as a linear combina- −1 −1 1 tion of the others: vk = γk γ1v1 γk γk−1vk−1. Obviously, every set containing the vector− 0 is−···− linearly dependent; every set con- taining two proportional vectors is linearly dependent; adding new vectors to a linearly dependent set leaves it linearly dependent. Thus, a basis of is a linearly independent set of vectors that spans the whole space.V

Dimension

In this course, we will be primarily concerned with finite dimen- sional vector spaces, i.e. spaces which can be spanned by finitely many vectors. If such a set of vectors is linearly dependent, then one of its vectors is a linear combination of the others. Removing this vector from the set, we obtain a smaller set that still spans . Continuing this way, we arrive at a finite linearly independent setV that spans . Thus, a finite dimensional vector space has a basis, consisting ofV finitely many elements. The number of elements in a basis does not depend (as we will see shortly) on the choice of the basis. This number is called the dimension of the vector space and is denoted dim . V V 1It is essential here that division by all non-zero scalars is well-defined. 1. Dimension and Rank 73

Let v1,..., vn be a basis of . Then every vector x is uniquely written as x = x v + V + x v . We call (x ,...,x∈ V ) 1 1 · · · n n 1 n coordinates of the vector x with respect to the basis v1,..., vn. For y = y v + + y v and λ K, we have: 1 1 · · · n n ∈V ∈ x + y = (x + y )v + + (x + y )v , 1 1 1 · · · n n n λx = (λx )v + + (λ x )v . 1 1 · · · n n n This means that operations of addition of vectors and multiplication by scalars are performed coordinate-wise. In other words, the map: Kn : (x ,...,x )t x v + + x v →V 1 n 7→ 1 1 · · · n n defines an isomorphism of the coordinate space Kn onto the vector space with a basis v ,..., v . V { 1 n} Lemma. A set of n + 1 vectors in Kn is linearly dependent. Proof. Any two vectors in K1 are proportional and therefore linearly dependent. We intend to prove the lemma by deducing from this that any 3 vectors in K2 are linearly dependent, then deducing from this that any 4 vectors in K3 are linearly dependent, and so on. Thus we only need to prove that if every set of n vectors in Kn−1 is linearly dependent then every set of n + 1 vectors in Kn is linearly dependent too.2 To this end, consider n + 1 column vectors v1, ..., vn+1 of size n each. If the last entry in each column is 0, then v1, ..., vn+1 are effectively n 1-columns. Hence some nontrivial linear combination − of v1, ..., vn is equal to 0 (by the induction hypothesis), and thus the whole set is linearly dependent. Now consider the case when at least one column has the last entry non-zero. Reordering the vectors we may assume that it is the column vn+1. Subtracting the column vn+1 with suitable coefficients α1, ..., αn from v1, ..., vn we can form n new columns u = v α v , ..., u = v α v so 1 1 − 1 n+1 n n − n n+1 that all of them have the last entries equal to zero. Thus u1, ..., un are effectively n 1-vectors and are therefore linearly dependent: − β1u1 + ... + βnun = 0 for some β1, ..., βn not all equal to 0. Thus β v + ... + β v (α β + ... + α β )v = 0. 1 1 n n − 1 1 n n n+1 Here at least one of βi = 0, and hence v1, ..., vn+1 are linearly de- pendent.  6

2This way of reasoning is called mathematical induction. Put abstractly, it establishes a sequence Pn of propositions in two stages called respectively the base and step of induction: (i) P1 is true; (ii) for all n = 2, 3, 4,... , if Pn−1 is true (the induction hypothesis) then Pn is true. 74 Chapter 3. SIMPLE PROBLEMS Corollaries. (1) Any set of m>n vectors in Kn is lin- early dependent. (2) Kn and Km are not isomorphic unless n = m. (3) Every finite dimensional vector space is isomorphic to exactly one of the spaces Kn. (4) In a finite dimensional vector space, all bases have the same number of elements. In particular, dimension is well-defined. (5) Two finite dimensional vector spaces are isomorphic if and only if their dimensions are equal. Indeed, (1) is obvious because adding new vectors to a linearly dependent set leaves it linearly dependent. Since the standard basis in Km consists of m linearly independent vectors, Km cannot be isomorphic to Kn if m>n. This implies (2) and hence (3), because two spaces isomorphic to a third one are isomorphic to each other. Now (4) follows, since the choice of a basis of n elements establishes an isomorphism of the space with Kn. Rephrasing (3) in terms of dimensions yields (5). Example 3. Let be a vector space of dimension n, and let v ,..., v be a basis ofV it. Then every vector x can be uniquely 1 n ∈V written as x = x1v1 + + xnvn. Here x1,...,xn can be considered as linear functions from· · · to K. Namely, the function x takes the V i value 1 on the vector vi and the value 0 on all vj with j = i. Every linear function f : K takes on a vector x the value6 f(x) = x f(v )+ + x fV(v →). Therefore f is the linear combination of 1 1 · · · n n x1,...,xn with the coefficients f(v1),...,f(vn), i.e. x1,...,xn span the dual space ∗. In fact, they are linearly independent, and thus form a basis of V∗. Indeed, if a linear combination γ x + + γ x V 1 1 · · · n n coincides with the identically zero function, then its values γi on the ∗ vectors vi must be all zeroes. We conclude that the dual space has the same dimension n as and is isomorphic to it. TheV V ∗ basis x1,...,xn is called the dual basis of with respect to the basis v ,..., v of . V 1 n V Remark. Corollaries 3, 5, and Example 3 suggest that in a sense there is “only one” K-vector space in each dimension n = 0, 1, 2,... , namely Kn. The role of this fact, which is literally true if the unique- ness is understood up to isomorphism, should not be overestimated. An isomorphism Kn is determined by the choice of a basis in , and is therefore not→ Vunique. For example, the space of polyno- Vmials of degree < n in one indeterminate x has dimension n and is isomorphic to Kn. However, different isomorphisms may be useful 1. Dimension and Rank 75 for different purposes. In one would use the basis 1,x,x2,...,xn−1. In Calculus 1,x,x2/2,...,xn−1/(n 1)! may be more common. In the theory of interpolation the basis− of Lagrange polynomials is used:

(x xj) L (x)= j6=i − . i (x x ) Qj6=i i − j Here x1,...,xn are given distinctQ points on the number line, and Li(xj) = 0 for j = i and Li(xi) = 1. The theory of orthogonal polynomials leads to6 many other important bases, e.g. those formed 3 by Chebyshev polynomials Tk, or Hermite polynomials Hk: k 2 d 2 T (x)= cos k cos−1(x) ,H (x)= ex e−x . k k dxk There is no preferred basis in ann-dimensional vector space (and hence no preferred isomorphism between and Kn). V V

Figure 28 Figure 29 The lack of the preferred isomorphism becomes really important when continuous families of vector spaces get involved. For instance, consider on the plane R2, all subspaces of dimension 1 (Figure 28). When subspaces rotate, one can pick a basis vector in each of them, which would vary continuously, but when the angle of rotation ap- proaches π, the direction of the vector disagrees with the initial di- rection of the same line. In fact it is impossible to choose bases in all the lines in a continuous fashion. The reason is shown on Figure 29: The surface formed by the continuous family of 1-dimensional subspaces in R2 has the topology of a M¨obius band (rather than a cylinder). The M¨obius band is a first example of nontrivial vec- tor bundles. Vector bundles are studied in Homotopic Topology. It turns out that among all k-dimensional vector bundles (i.e. continu- ous families of k-dimensional vector spaces) the most complicated are the bundles formed by all k-dimensional subspaces in the coordinate space of dimension n k. ≫ 3After Pafnuty Chebyshev (1821– 1984). 76 Chapter 3. SIMPLE PROBLEMS Corollary 6. A finite dimensional space is canonically isomorphic to its second dual ∗∗. V V Here “canonically” means that there is a preferred isomorphism. Namely, the isomorphism is established by the map E : ∗∗ from Example 5b of Chapter 2, Section 3. Recall that to aV→V vector v , ∗ ∈V it assigns the linear function Ev : V K, defined by evaluation of linear functions on the vector v. The kernel→ of this map is trivial (e.g. because one can point a linear function that takes a non-zero value on a given non-zero vector). The range E( ) must be a subspace in ∗∗ isomorphic to . But dim ∗∗ = dimV ∗ = dim . Thus the rangeV must be the wholeV space ∗∗V. V V V

Rank

In the previous subsection, we constructed a basis of a finite dimen- sional vector space by starting from a finite set that spans it and removing unnecessary vectors. Alternatively, one can construct a basis by starting from any linearly independent set and adding, one by one, new vectors linearly independent from the previous ones. Since the number of such vectors cannot exceed the dimension of the space, the process will stop when the vectors span the whole space and form therefore a basis. Thus we have proved that in a finite dimensional vector space, every linearly independent set of vectors can be completed to a basis.4 We are going to use this in the proof of the Rank Theorem. The rank of a linear map A : is defined as the dimension of its range: rk A := dim A( ). V→W V n m Example. Consider the map Er : K K given by the block I 0 → matrix E = r , of size m n, where the left upper block is r 0 0 × the identity matrix I of size r r, and the other three blocks are zero r × matrices of appropriate sizes. In standard coordinates (x1,...,xn) n m in K and (y1,...,ym) in K , the map Er is given by the formulas y1 = x1,...,yr = xr,yr+1 = 0,...,ym = 0. The range of Er is the subspace of dimension r in Km given by the m r equations − yr+1 = = ym. Thus rk Er = r. The kernel of Er is the subspace · · · n of dimension n r in K given by r equations x1 = = xr = 0. The map can be− viewed geometrically as the projection· · · along the kernel onto the range.

4Using the so called transfinite induction one can prove the same for infinite dimensional vector spaces as well. 1. Dimension and Rank 77 The Rank Theorem. A linear map A : of rank r between two vector spaces of dimensionsVn →Wand m is given by the matrix E in suitable bases of the spaces and . r V W Proof. Let f ,..., f be any basis in the range A( ) . Com- 1 r V ⊂W plete it to a basis of by choosing vectors fr+1,..., fm as explained above. Pick vectors We ,..., e such that Ae = f . (They exist 1 r ∈ V i i because fi lie in the range of A.) Take vectors er+1, er+2,... to form a basis in the kernel of A. We claim that e1,..., er, er+1 . . . form a basis in (and in particular the total number of these vectors is equal V to n). The theorem follows from this, since Aei = fi for i = 1,...r, and Aei = 0 for i = r + 1,...,n, and hence the matrix of A in these bases coincides with Er.

A V W

v1 f 1 f r vr

v r+1 f r+1

f v m r+2

Figure 30

To justify the claim, we will show that every vector x is ∈ V uniquely written as a linear combination of ei. Indeed, we have: Ax = α1f1 + + αrfr since Ax lies in the range of A. Then A(x α e · · · α e )= 0, and hence x α e α e lies in − 1 1 −···− r r − 1 1 −···− r r the kernel of A. Therefore x = α1e1 + + αrer + αr+1er+1 + , i.e. the vectors v span . On the other· · · hand, if in the last equal-· · · i V ity we have x = 0, then Ax = α1f1 + + αrfr = 0 and hence α = = α = 0, since f are linearly independent· · · in . Finally, 1 · · · r i W 0 = αr+1er+1 + αr+2er+2 + . . . implies that αr+1 = αr+2 = = 0 since e , e ,... are linearly independent in .  · · · r+1 r+2 V Let A be an m n matrix. It defines a linear map Kn Km. The rank of this map is× the dimension of the subspace in Km→spanned by columns of A. It is called the rank of the matrix A. Applying the Rank Theorem to this linear map, we obtain the following result. Corollary. For every m n-matrix A of rank r there exist invertible matrices D and× C of sizes m m and n n −1 × × respectively such that D AC = Er. 78 Chapter 3. SIMPLE PROBLEMS The Rank Theorem has the following reformulation. Let A : and A′ : ′ ′ be two linear maps. They are called equivalentV → W V →W ∼ ∼ if there exist isomorphisms C : ′ = V and D : ′ = W such that DA′ = AC. One expresses theV → last equality byW saying→ that the following square is commutative:

′ ′ A ′ V −→ W C ∼= ∼= D . ↓ A ↓ V −→ W The Rank Theorem′. Linear maps between finite dimen- sional spaces are equivalent if and only if they have the same rank. Indeed, when A′ = D−1AC, the ranges of A and A′ must have the same dimension (since C and D are isomorphisms). Conversely, ′ ′ n m when rk A = r = rk A , each A and A is equivalent to Er : K K by the Rank Theorem. → Below we discuss further corollaries and applications of the Rank Theorem.

Adjoint Maps Given a linear map A : , one defines its adjoint map, acting between dual spacesV in →W the opposite direction: At : ∗ ∗. a W →V Namely, to a linear function K, the adjoint map At assigns the a W → composition A K, i.e.: V →W → (Ata)(x)= a(Ax) for all x and a ∗. ∈V ∈W In coordinates, suppose that A : Kn Km is given by m linear functions in n variables: → y = a x + + a x 1 11 1 · · · 1n n · · · y = a x + + a x . m m1 1 · · · mn n m ∗ These equalities show that the elements yi of the dual basis in (K ) t are mapped by A to the linear combinations ai1x1 + + ainxn of n ∗ · · · the elements xj of the dual basis in (K ) . Therefore columns of the matrix representing the map At in these bases are rows of A. Thus, matrices of adjoint maps with respect to dual bases are transposed to each other. 1. Dimension and Rank 79 Corollary 1. Adjoint linear maps have the same rank.

Indeed, when a map A : has the matrix Er in suitable tV→W t bases of and , the map A has the matrix Er in respectively dual bases of V ∗ andW ∗. Thus rk At = rk Et = r. W V r Remark. Here is a more geometric way to understand this fact. According to the homomorphism theorem, the range of A : is a subspace in canonically isomorphic to / Ker A. The rangeV→W of At is exactly theW dual space ( / Ker A)∗ consideredV as the subspace in ∗ which consists of all thoseV linear functions on that vanish on theV Ker A. Since dual spaces have the same dimension,V we conclude once again that rk A = rk At.

The range of the linear map A : Kn Km is spanned by columns of the matrix A. Therefore the rank of→A is equal to the maximal number of linearly independent columns of A.

Corollary 2. The maximal number of linearly indepen- dent rows of a matrix is equal to the maximal number of linearly independent columns of it.

Ranks and Determinants

Corollary 3. The rank of a matrix is equal to the maximal size k for which there exists a k k-submatrix with a non- zero determinant. × Proof. Suppose that a given m n-matrix has rank r. Then there exists a set of r linearly independent× columns.of it. These columns form an m r-matrix of rank r. Therefore there exists a set of r linearly independent× rows of it. These rows form an r r matrix M of rank r. By Corollary of the Rank Theorem, this matrix× −1 can be written as the product M = DErC where D and C are invertible r r-matrices, and Er = Ir is the identity matrix of size r. Since invertible× matrices have non-zero determinants, we conclude that det M = (det D)/(det C) = 0. 6 On the other hand, let M ′ be a k k-submatrix of the given matrix, such that k > r. Then columns of×M ′ are linearly dependent. Therefore one of them can be represented as a linear combination of the others. Since determinant don’t change when from one of the columns, a linear combination of other columns is subtracted, we conclude that det M ′ = 0. 80 Chapter 3. SIMPLE PROBLEMS Systems of Linear Equations — Theory Let Ax = b be a system of m linear equations in n unknowns x with the coefficient matrix A and the right hand side b. Let r = rk A. Corollary 4. (1) The solution set to the homogeneous system Ax = 0 is a linear subspace in Kn of dimension n r (namely, the kernel of A : Kn Km). − → (2) The system Ax = b is consistent (i.e. has at least one solution) only when b lies in a certain subspace of dimension r (namely, in the range of A). (3) When it does, the solution set to the system Ax = b is an affine subspace in Kn of dimension n r parallel to the kernel of A. − Indeed, this is obviously true in the special case A = Er, and is therefore true in general due to the Rank Theorem. A subspace (affine or linear) of dimension n r in a space of di- mension n is said to have codimension r. Thus,− rephrasing Corol- lary 4, we can say that the solution space to a system Ax = b is either empty (when the column b does not lie in a subspace spanned by columns of A), or is an affine subspace of codimension r paral- lel to the solution spaces of the corresponding homogeneous system Ax = 0. One calls the rank r of the matrix A also the rank of the system Ax = b.

b V

A

0 0 U+V U T

Figure 31 Figure 32

Consider now the case when the number of equations is equal to the number of unknowns. Corollary 5. When det A = 0, the linear system Ax = b of n linear equations with n unknowns6 has a unique solution for every b, and when det A = 0, solutions are non-unique for some (but not all) b and do not exist for all others. 1. Dimension and Rank 81 Indeed, when det A = 0, the matrix A is invertible, and x = A−1b is the unique solution.6 When det A = 0, the rank r of the system is smaller than n. Then the range of A has positive codimension, and the kernel has positive dimension, both equal to n r (Figure 31). − Dimension Counting In 3-space, two distinct planes intersect in a line, and line meets a plane at a point. How do these geometric statements generalize to higher dimensions? It follows from the Rank Theorem, that dimensions of the range and kernel of a linear map add up to the dimension of the domain space. We will use this fact to answer the above question. Corollary 6. If linear subspaces of dimensions k and l span together a subspace of dimension n, then their inter- section is a linear subspace of dimension k + l n. − Proof. Let , be linear subspaces of dimensions k and l in a vector spaceU V ⊂W, and = be their intersection (Figure 32). Denote by +W T theU∩V subspace of dimension n spanned by vectors of andU V.⊂W Define a linear map A : , where = (u, vU) u V, v is the direct sum,U⊕V→W by A(u, v)= u v. TheU⊕V range{ of A| coincides∈U ∈ with V} + . The kernel of A consists− of all those pairs (u, v), where u U Vand v , for which u = v. ∈ U ∈ V Therefore Ker A = (t, t) t ∼= . Thus dim( + )+dim T = dim( ) = dim { + dim| ∈. T We } concludeT that dimU TV= k + l n. U⊕V U V − EXERCISES 177. Prove that columns of an invertible n n-matrix form a basis in Kn, × and vice versa: every basis in Kn is thus obtained. 178. Find the dimension of the subspace in K4 given by two equations: x1 + x2 + x3 = 0 and x2 + x3 + x4 = 0.  179. Find the dimension of the subspace in RR spanned by functions cos(x + θ1),..., cos(x + θn), where θ1,...,θn are given distinct angles.  180.⋆ In the space of polynomials of degree

181. In R[x], find coordinates of the Chebyshev polynomials T4 in the basis of monomials xk, k =0, 1, 2,...  182. Professor Dumbel writes his office and home phone numbers as a 7 1- × matrix O and 1 7-matrix H respectively. Help him compute rk(OH).  × 82 Chapter 3. SIMPLE PROBLEMS 183. Prove that rk A does not change if to a row of A a linear combination of other rows of A is added. 184. Prove that rk(A + B) rk A + rk B.  ≤ 185. Following the proof of the Rank Theorem, find bases in the domain and the target spaces in which the following linear map A : K3 K3 → y =2x x x 1 1 − 2 − 3 y = x +2x x 2 − 1 2 − 3 y = x x +2x 3 − 1 − 2 3 has the matrix E . For which b K3 the system Ax = b is consistent?  2 ∈ 186. Given a linear map A : , its right inverse (respectively, V → W left inverse) is defined as a linear map B : such that AB = W →V id (respectively, BA = id ), where id and id denote the identity W V V W transformations on and . Prove that a right (left) inverse to A exists if V W and only if rk A = dim (rk A = dim ), and that neither is unique unless W V dim V = dim W . 187. Suppose that a system Ax = b of m linear equations in 2009 un- knowns has a unique solution for b = (1, 0,..., 0)t. Does this imply that: 1 t (a) Ker A = 0 , (b) rk A = 2009, (c) m 2009, (d) A− exists, (e) A A { } ≥ is invertible, (f) det(AAt) = 0, (g) rows of A are linearly independent, (h) 6 columns of A are linearly independent?  188. Given A : Kn Km, consider two adjoint systems: Ax = f and → Ata = b. Prove that one of them (say, Ax = f) is consistent for a given right hand side vector (f Km) if and only if this vector is annihilated ∈ m (i.e. a(f) = 0) by all linear functions (a (K )∗) satisfying the adjoint ∈ homogeneous system (Ata = 0).  189. Prove that two affine planes lying in a vector space are contained in an affine subspace of dimension 5. ≤ 190. The solution set of a single non-trivial linear equation a(x) = b is called a hyperplane (affine if b = 0 and linear if b = 0). Show that a 6 hyperplane is an (affine or linear) subspace of codimension 1. 191. Find possible codimensions of intersections of k linear hyperplanes.  192. Prove that every subspace in Kn can be described as: (a) the range of a linear map; (b) the kernel of a linear map. 193. Classify linear subspaces in Kn up to linear transformations of Kn.  194.⋆ Classify pairs of subspaces in Kn up to linear transformations.  195. Let K be a finite field of q elements. Compute the number of: (a) vectors in a K-vector space of dimension n, (b) bases in Kn, (c) n r- × matrices of rank r, (d) subspaces of dimension r in Kn.  196.⋆ Prove that if a field has q elements, then q is a power of a prime.  2. Gaussian Elimination 83 2 Gaussian Elimination

Evaluating the determinant of a 20 20-matrix directly from the definition of determinants requires 19× multiplications for each of the 20! > 2 1018 elementary products. On a typical PC that makes 1 gi- gaflops ·(i.e. 109 FLoating point Operations Per Second), this would take about 4 1010 seconds, which is a little longer than 1000 years. based· on Gaussian elimination allow your PC to evaluate much larger determinants in tiny fractions of a second.

Row Reduction Usually, solving a system of linear algebraic equations with coeffi- cients given numerically we, using one of the equations, express the 1st unknown via the other unknowns and eliminate it from the re- maining equations, then express the 2nd unknown from one of the remaining equations, etc., and finally arrive to an equivalent alge- braic system which is easy to solve starting from the last equation and working backward. This computational procedure called Gaus- sian elimination can be conveniently organized as a sequence of operations with rows of the coefficient matrix of the system. Namely, we use three elementary row operations: transposition of two rows; • division of a row by a non-zero scalar; • subtraction of a multiple of one row from another one. • Example 1. Solving the system

x2 + 2x3 = 3 2x + 4x = 2 1 2 − 3x1 + 5x2 + x3 = 0 by Gaussian elimination, we pull the 2nd equation up (since the 1st equation does not contain x1), divide it by 2 (in order to express x1 via x2) and subtract it 3 times from the 3rd equation in order to get rid of x1 therein. Then we use the 1st equation (which has become the 2nd one in our pile) in order to eliminate x2 from the 3rd equation. The coefficient matrix of the system is subject to the elementary row transformations:

0 1 2 3 2 4 0 2 1 2 0 1 2 4 0 | 2 0 1 2 | −3 0 1 2 | −3  3 5 1 | −0  7→  3 5 1 | 0  7→  3 5 1 | 0  | | |       84 Chapter 3. SIMPLE PROBLEMS 1 2 0 1 1 2 0 1 1 2 0 1 0 1 2 | −3 0 1 2 | −3 0 1 2 | −3 . 7→  0 1 1 | 3  7→  0 0 3 | 6  7→  0 0 1 | 2  − | | | The final “triangular” shape of the coefficient matrix is an example of the . If read from bottom to top, it represents the system x3 = 2, x2 + 2x3 = 3, x1 + 2x2 = 1 which is ready to be solved by back substitution: x = 2, x =− 3 2x = 3 4 = 3 2 − 3 − 1, x1 = 1 2x2 = 1 + 2 = 1. The process of back substitution, expressed− − in the− matrix− form, consists of a sequence of elementary row operations of the third type:

1 2 0 1 1 2 0 1 1 0 0 1 0 1 2 | −3 0 1 0 | −1 0 1 0 | 1 .  0 0 1 | 2  7→  0 0 1 | −2  7→  0 0 1 | −2  | | |       The last matrix is an example of the reduced row echelon form and represents the system x1 = 1, x2 = 1, x3 = 2 which is “already solved”. − In general, Gaussian elimination is an algorithm of reducing an to a row-echelon form by means of elementary row operations. By an augmented matrix we mean simply a matrix subdivided into two blocks [A B]. The augmented matrix of a lin- | ear system Ax = b in n unknowns is [a1, ..., an b] where a1, ..., an are columns of A, but we will also make use of| augmented matri- ces with B consisting of several columns. Operating with a row [a1, ..., an b1, ...] of augmented matrices we will refer to the leftmost | 5 non-zero entry among aj as the leading entry of the row. We say that the augmented matrix [A B] is in the row echelon form of rank r if the m n-matrix A satisfies| the following conditions: × each of the first r rows has the leading entry equal to 1; • leading entries of the rows 1, 2, ..., r are situated respectively in • the columns with indices j1, ..., jr satisfying j1 < j2 < ... < jr; all rows of A with indices i > r are zero. • Notice that a matrix in a row echelon has zero entries everywhere below and to the left of each leading entry. A row echelon form is called reduced (Figure 33) if all the entries in the columns j1, ..., jr above the leading entries are also equal to zero.

5Also called leading coefficient, or pivot. 2. Gaussian Elimination 85 If the matrix A of a linear system is in the row echelon form and indeed has one or several zero rows on the bottom, then the system contains equations of the form 0x1 + ... + 0xn = b. If at least one of such b is non-zero, the system is inconsistent (i.e. has no solutions). If all of them are zeroes, the system is consistent and ready to be solved by back substitution.

t1t 2 t3 t n−r 1 1 *0 0 0 * * 0 0 * 0 ** 0 * 1 0 0 * * 0 0 * 0* * 0 * 1 0 * * 0 0 * 0 * * 0 * 1 * * 0 0 * 0 * * 0 * 1 0 * 0 * * 0 * 1 * 0 * * 0 * 1 * * 0 * r 1 * 0 * * m * x1 xn Figure 33

Example 2. The following augmented matrix is in the row ech- elon form of rank 2: 1 2 3 0 0 0 1 | 2 .  0 0 0 | 0  |   It corresponds to the system x1 + 2x2 + 3x3 = 0, x3 = 2, 0=0. The system is consistent: x3 = 2, x2 = t, x1 = 3x3 2x2 = 6 2t satisfy the system for any value of the parameter− t−. − − We see that presence of leading entries in the columns j1, ..., jr of the row echelon form allows one to express the unknowns xj1 , ..., xjr in terms of the unknowns x with j = j , ..., j , while the values j 6 1 r t1, ..., tn−r of the unknowns xj, j = j1, ..., jr remain completely am- biguous. In this case, solutions to6 the linear system depend on the n r parameters t , ..., t . − 1 n−r The algorithm of row reduction of an augmented matrix [A B] to the row echelon form can be described by the following instructions.| Let n be the number of columns of A. Then the algorithm consists of n steps. At the step l = 1, ..., n, we assume that the matrix formed by the columns of A with the indices j = 1, ..., l 1 is already in the row echelon form of some rank s < l, with the leading− entries located 86 Chapter 3. SIMPLE PROBLEMS in some columns j1 < ... < js < l. The l-th step begins with locating the first non-zero entry in the column l below the row s. If none is found, the l-th step is over, since the columns 1, ..., l are already in the row echelon form of rank s. Otherwise the first non-zero entry is located in a row i(>s), and the following operations are performed: (i) transposing the rows i and s + 1 of the augmented matrix, (ii) dividing the whole row s + 1 of the augmented matrix by the leading entry, which is now a (= 0), s+1,l 6 (iii) annihilating all the entries in the column l below the leading entry of the s + 1-st row by subtracting suitable multiples of the s + 1-st row of the augmented matrix from all rows with indices i>s + 1. After that, the l-th step is over since the columns 1, ..., l are now in the row echelon form of rank s + 1. When an augmented matrix [A B] has been reduced to a row | echelon form with the leading entries a1,j1 = ... = ar,jr = 1, the back substitution algorithm, which reduces it further to a reduces row echelon form, consists of r steps which we number by l = r, r 1, ..., 1 and perform in this order. On the l-th step, we subtract from− each of the rows i = 1, ..., l 1 of the augmented matrix, the l-th row − multiplied by ai,jl , and thus annihilate all the entries of the column jl above the leading entry.

Applications Row reduction algorithms allow one to compute efficiently determi- nants and inverses of square matrices given numerically, and to find a basis in the null space, column space and row space of a given rectangular matrix (i.e., speaking geometrically, in the kernel of the matrix, its range, and the range of the transposed matrix). Proposition 1. Suppose that an m n-matrix A has been reduced by elementary row operations× to a row echelon form ′ A of rank r with the leading entries a1,j1 = ... = ar,jr = 1, j1 < ... < jr. Then (1) rk A = rk A′ = r, (2) rows 1,...,r of A′ form a basis in the row space of A, (3) the columns of A with indices j1, ..., jr form a basis in the column space of A. Proof. Elementary row operations do not change the space spanned by rows of the matrix. The non-zero rows of a row ech- 2. Gaussian Elimination 87 elon matrix are linearly independent and thus form a basis in the row space. In particular, rk A = rk A′ = r. The row operations change columns a1, ..., an of the matrix A, but preserve linear dependencies among them: α1a1 + ... + αnan = 0 ′ ′ ′ ′ if and only if α1a1 + ... + αnan = 0. The r columns aj1 , ..., ajr of the matrix A′ in the row echelon form which contain the leading entries are linearly independent. Therefore columns aj1 , ..., ajr of the matrix A are linearly independent too and hence form a basis in the column space of A.  Example 3. The following row reduction 1 2 3 1 1 2 3 1 1 2 3 1 245− 1 0 0 1− 3 0 0 1 −3  368 0  7→  0 0 −1 3  7→  000− 0  −       shows that the matrix has rank 2, rows (1, 2, 3, 1), (0, 0, 1, 3) form a basis in the row space, and columns (1, 2, 3)−t, (3, 5, 8)t a− basis in the column space. Suppose that the augmented matrix [A b] of the system Ax = b has been transformed to a reduced row echelon| form [A′ b′] with the | leading entries positioned in the columns j1 < j2 < ... < jr. These columns are the unit coordinate vectors e1, ..., er, and the system is ′ ′ ′ consistent only if b is their linear combination, b = b1e1 + ... + ′ brer. Assuming that this is the case we can assign arbitrary values t , ..., t to the unknowns x , j = j , ..., j , and express x , ..., x 1 n−r j 6 1 r j1 jr as linear inhomogeneous functions of t1, ..., tn−r. The general solution to the system will have the form x = v0 + t1v1 + ... + tn−rvn−r of a linear combination of some n-dimensional vectors v0, v1, ..., vn−r. We claim that the vectors v1, ..., vn−r form a basis in the null space of the matrix A. Indeed, substituting t = 0 we conclude that v satisfies the equation Av = b. Therefore x v = t v +...+ 0 0 − 0 1 1 tn−rvn−r form the general solution to the homogeneous system Ax = 0, i.e. the null space of A. In addition, we see that the solution set to the inhomogeneous system is the affine subspace in Rn obtained from the null space by the translation through the vector v0. Example 4. Consider the system Ax = 0 with the matrix A from Example 3. Transform the matrix to the reduced row echelon form: 1 2 3 1 120 8 ... 0 0 1 −3 0 0 1 3 . 7→  000− 0  7→  000− 0      88 Chapter 3. SIMPLE PROBLEMS The general solution to the system assumes the form x 2t 8t 2 8 1 − 1 − 2 − − x2 t1 1 0 = = t1 + t2 .  x3   3t2   0   3  x t 0 1  4   2              The columns ( 2, 1, 0, 0)t and ( 8, 0, 3, 1)t form therefore a basis in the null space− of the matrix A. − Proposition 2. Suppose that in the process of row re- duction of an n n matrix A to a row echelon form A′ row transpositions occurred× k times, and the operations of divi- ′ sion by leading entries α1, ..., αr were performed. If rk A

1 0 1 2 1 0 0 1 2 0 0 2 0 2 4 0 | 0 1 0 0 1 2 | 1 0 0  3 5 1 | 0 0 1  7→  3 5 1 | 0 0 1  7→ | |     1 1 1 2 0 0 2 0 1 2 0 0 2 0 0 1 2 | 1 0 0 0 1 2 | 1 0 0 .  0 1 1 | 0 3 1  7→  0 0 1 | 1 1 1  − | − 2 | 3 − 2 3     2. Gaussian Elimination 89 Here one transposition of rows and divisions by 2 and by 3 were applied. Thus det A = ( 1) 2 3= 6, and the matrix is invertible. Back substitution eventually− · yields· the− inverse matrix:

1 2 0 0 1 0 1 0 0 2 3 4 | 2 3 2 3 0 1 0 1 1 2 0 1 0 | − 1 −1 2 . 7→  3 3  7→  3 3  0 0 1 | 1 1 − 1 0 0 1 | 1 1 − 1 | 3 − 2 3 | 3 − 2 3     Remark. Gaussian elimination algorithms are unlikely to work well for matrices depending on parameters. To see why, try row reduction in order to solve a linear system of the form (λI A)x = 0 depending on the parameter λ, or (even better!) apply− Gaussian elimination to the system a11x1 + a12x2 = b1, a21x1 + a22x2 = b2 depending on 6 parameters a11, a12, a21, a22, b1, b2.

LP U Decomposition Gaussian elimination is an algorithm. The fact that it always works is a theorem. In this and next subsections, we reformulate this theorem (or rather an important special case of it) first in the language of matrix algebra, and then in geometric terms. Recall that a square matrix A is called upper triangular (re- spectively lower triangular) if aij = 0 for all i > j (respectively i < j). We call P a permutation matrix if it is obtained from the identity matrix In by a permutation of columns. Such P is indeed the matrix of a linear transformation in Kn defined as the permutation of coordinate axes. Theorem. Every invertible matrix M can be factored as the product M = LP U of a lower triangular matrix L, a permutation matrix P , and an upper triangular matrix U. The proof of this theorem is based on interpretation of elemen- tary row operations in terms of matrix multiplication. Consider the following m m-matrices: × Tij (i = j), a transposition matrix, obtained by transposing the •i-th and6 j-th columns of the identity matrix; Di(d) (d = 0), a diagonal matrix, all of whose diagonal entries are• equal to 16 except the i-th one, which is equal to 1/d; Lij(α) (i > j), a lower triangular matrix, all of whose diagonal entries• are equal to 1, and all off-diagonal equal to 0 except the entry in i-th row and j-th column, which is equal to α. − 90 Chapter 3. SIMPLE PROBLEMS Here are examples T , D (3), and L ( 2) of size m = 4: 13 2 24 − 00 10 1 0 00 10 00 01 00 0 1 0 0 01 00 , 3 , .  10 00   0 0 10   00 10  00 01 02 01    0 0 01          Elementary row operations on a given m n-matrix can be described × as multiplication on the left by Tij, Di(d), or Lij(α), which results respectively in transposing the i-th and j-th rows, dividing the i-th row by d, and subtracting the i-th row times α from the j-th row. Note that inverses of elementary row operations are also elementary row operations. Thus, Gaussian elimination allows one to represent any matrix A as the product of a row echelon matrix with matrices of elementary row operations. In order to prove the LP U decompo- sition theorem, we will combine this idea with a modification of the Gaussian elimination algorithm. Let M be an invertible n n-matrix. We apply the row reduction process to it, temporarily refraining× from using permutations and divisions of rows and using only the row operations equivalent to left multiplication by Lij(α). On the i-th step of the algorithm, if the i-th row does not contain a non-zero entry where expected, we don’t swap it with the next row. Instead, we locate in this row the leading (i.e. leftmost non-zero) entry, which must exist since the matrix is invertible. When it is found in a column j, we subtract multiples of the row i from rows i + 1,...,n with such coefficients that all entries of these rows in the column j become annihilated. Example 7. Let us illustrate the modified row reduction with the matrix taken from Example 1, and at the same time represent the process as matrix factorization. On the first step, we subtract the 1st row from the 2nd and 3rd 4 and 5 times respectively, and 3 on the second step subtract the 2nd row times 2 from the 3rd. The lower triangular factors shown are inverses of the matrices L12(4), 3 L13(5) and L23( 2 ). The leading entries are boldfaced:

0 1 2 1 0 0 1 0 0 0 1 2 2 4 0 = 4 1 0 0 1 0 2 0 8  3 5 1   0 0 1   5 0 1   3 0 −9  −         1 0 0 1 0 0 1 0 0 0 1 2 = 4 1 0 0 1 0 0 1 0 2 0 8 .      3   −  0 0 1 5 0 1 0 2 1 0 0 3         2. Gaussian Elimination 91 Products of lower triangular matrices are lower triangular. As a result, applying the modified row reduction we obtain a factorization of the form M = LM ′, where L is a lower triangular matrix with all diagonal entries equal to 1. Note that on the ith step of the algorithm, when a leading entry in the row i is searched, the leading entries of the previous rows can be in any columns j1,...,ji−1. The entries of the i-row situated in these columns have been annihilated at the previous steps. Therefore the leading entry of the i-th row will be found in a new column. This shows that the columns j1,...,jn of the leading entries found in the rows 1,...,n are all distinct and thus form a permutation of 1,...,n . We now write M ′ = P U, { } where P is the matrix of the permutation 1,...,n . The operation j1,...,jn ′ −1 ′ ′ M U = P M permutes rows of M and places all leading entries of the7→ resulting matrix U on the diagonal. Here is how this works in our example: 0 1 2 0 1 0 2 0 8 2 0 8 = 1 0 0 0 1 −2 .  0 0 −3   0 0 1   0 0 3  Since leading entries are leftmost non-zero  entries in thei r rows, the matrix U turns out to be upper triangular. This completes the proof. Remarks. (1) As it is seen from the proof, the matrix L in the LP U factorization can be required to have all diagonal entries equal to 1. Triangular matrices with this property are called unipotent. Alternatively, one can require that U is unipotent. (2) When the permutation matrix P = In, one obtains the LU decomposition, M = LU, of some invertible matrices. Which ones? One can work backward and consider products LDU of a lower and upper triangular unipotent matrices L and U with an invertible di- agonal matrix D in between (the so called LDU decomposition). A choice of such matrices depends on a number of arbitrary parame- ters: n(n 1)/2 for each L and U, and n non-zero parameters for D, i.e. totally−n2. This is equal to the total number of matrix elements, suggesting that a typical n n-matrix admits an LDU factorization. × (3) This heuristic claim can be made precise. As illustrated by the above example of LP U decomposition, when P = In, certain entries of the resulting factor U come out equal to 0.6 This is be- cause some entries of the matrix M ′ on the right of leading ones are annihilated at previous steps of row reduction. As a result, such de- compositions involve fewer than n2 arbitrary parameters, and hence cover a positive codimension locus in the matrix space. Thus, the bulk of the space is covered by factorizations LP U with P = In. 92 Chapter 3. SIMPLE PROBLEMS Flags and Bruhat cells A sequence of nested subspaces is said to form a V1 ⊂V2 ⊂···⊂Vn flag in the space = n. When dim k = k for all k = 1,...,n, the flag is called completeV V. V Given a basis f1,..., fn in , one can associate to it the standard coordinate flag (Figure 34)V Span(f ) Span(f , f ) Span(f ,..., f )= . 1 ⊂ 1 2 ⊂···⊂ 1 n V Let U : be an invertible linear transformation preserving the standardV→V coordinate flag. Then the matrix of U in the given basis is upper triangular. Indeed, since U( ) , the vector Uf is a Vk ⊂ Vk k linear combination of f1,..., fk, i.e. Ufk = i≤k uikfi. Since this is true for all k, the matrix [uik] is upper triangular. Reversing this argument, we find that if the matrix of U isP upper triangular, then U preserves the flag.

f Wn−i 3 W n−i+1 Wn−i+1 Vj−1 V3

V2 e W n−i+1 Vj i f 1 f 2 V 1

Figure 34 Figure 35

Conversely, given a complete flag = Kn, one can V1 ⊂···⊂Vn pick a basis f1 in the line 1, then complete it to a basis f1, f2 in the plane , and so on, untilV a basis f ,..., f in the whole space V2 1 n Vn is obtained, such that k = Span(f1,..., fk) for each k. This shows that every completeV flag in Kn can be obtained from any other by an invertible linear transformation. Indeed, the lin- ear transformation, defined in terms of the standard basis e1,..., en of Kn by x e x f , transforms the standard coordinate flag i i 7→ i i Span(e1) Span(e1,..., en) into the given flag 1 n. ⊂···⊂P P V ⊂···⊂V Example 8. Let P : Kn Kn act by a permutation σ = σ → 1 ... n of coordinate axes. It transforms the standard coordinate i1 ... in flag into a coordinate flag  F : Span(e ) Span(e , e ) Span(e ,..., e )= Kn, σ i1 ⊂ i1 i2 ⊂···⊂ i1 in 2. Gaussian Elimination 93 called so because all the spaces are spanned by vectors of the stan- dard basis. There are n! such flags, one for each permutation. For 1 ... n instance, Fid is the standard coordinate flag. When σ = n ... 1 , the flag opposite to the standard one is obtained:  Span(e ) Span(e , e ) Span(e ,..., e ). n ⊂ n n−1 ⊂···⊂ n 1 Transformation of Kn defined by lower triangular matrices are ex- actly those that preserve this flag. Theorem. Every complete flag in Kn can be transformed into exactly one of n! coordinate flags by invertible linear transformations preserving one of them (e.g. the flag op- posite to the standard one). n Proof. Let F be a given complete flag in K , Fid the standard coordinate flag, M a linear transformation such that F = M(Fid), and M = LPσU its LP U decomposition. Since U(Fid) = Fid, and Pσ(Fid) = Fσ, we find that F = L(Fσ). Therefore the given flag −1 F is transformed into a coordinate flag Fσ by L , which is lower triangular and thus preserves the flag opposite to Fid. It remains to show that the same flag F cannot be transformed this way into two different coordinate flags Fσ. Let j, dim j = j, denote the spaces of the flag F, and = Span(eV,..., e V ) the Wn−i n i+1 spaces of the flag opposite to the standard one, codim n−i = i. Invertible transformations preserving the spaces canW change Wn−i their intersections with j, but cannot change the dimensions of these intersections. Thus it sufficesV to show that, in the case of the flag Fσ, the permutation σ is uniquely determined by these dimensions. Note that ei n−i+1 but ei / n−i (Figure 35), i.e. the quo- tient (1-dimensional!)∈W space ∈W/ is spanned by the image Wn−i+1 Wn−i of ei. Suppose that in the flag Fσ, the vector ei first occurs in the subspace j, i.e. σ(j)= i. Consider the increasing sequence of spaces V = Kn, their intersections with , and the pro- V1 ⊂···⊂Vn Wn−i+1 jections of these intersections to the quotient space n−i+1/ n−i. Examining the ranges of these maps and their dimensions,W weW find the sequence of j 1 zeroes followed by n j ones. Thus j = σ−1(i) is determined by− the flag.  − Remark. The theorem solves the following classification problem: In a vector space equipped with a fixed complete flag 1 n, classify all complete flags up to invertible linear transforW mations⊂···⊂W pre- serving the fixed flag. According to the theorem, there are n! equiv- alence classes determined by dimensions of intersections of spaces of 94 Chapter 3. SIMPLE PROBLEMS the flags with the spaces of the fixed flag. The equivalence classes are known as Bruhat cells. This formulation also shows that the Gaussian elimination algorithm can be understood as a solution to a simple geometric classification problem. One can give a purely geometric proof of the above theorem (and hence a new proof of the LP U decomposition) by refining the argument in the proof of the Rank Theorem. the Rank Theorem, Gaussian elimination, and Bruhat cells under the same roof.

EXERCISES 197. Solve systems of linear equations: 

x1 + x2 3x3 = 1 2x1 + x2 + x3 =2 2x1 x2 x3 =4 − − − − 2x1 + x2 2x3 =1 x1 +3x2 + x3 =5 3x1 +4x2 2x3 = 11 − − x1 + x2 + x3 =3 x1 + x2 +5x3 = 7 3x1 2x2 +4x3 = 11 − − x1 +2x2 3x3 =1 2x1 +3x2 3x3 = 14 − −

x1 2x2 +3x3 4x4 =4 x1 2x2 + x3 + x4 =1 − − − x2 x3 + x4 = 3 x1 2x2 + x3 x4 = 1 − − − − − x1 +3x2 3x4 =1 x1 2x2 + x3 +5x4 =5 − − 7x2 +3x3 + x4 = 3 − −

2x1 +3x2 x3 +5x4 =0 3x1 +4x2 5x3 +7x4 =0 3x x +2− x 7x =0 2x 3x +3− x 2x =0 1 − 2 3 − 4 1 − 2 3 − 4 4x1 + x2 3x3 +6x4 =0 4x1 + 11x2 13x3 + 16x4 =0 x 2x +4− x 7x =0 7x 2x +−x +3x =0 1 − 2 3 − 4 1 − 2 3 4

x1 + x2 + x3 + x4 + x5 =7 3x +2x + x + x 3x = 2 1 2 3 4 − 5 − . x2 +2x3 +2x4 +6x6 = 23 5x +4x +3x +3x x = 12 1 2 3 4 − 5 198.⋆ Find those λ for which the system is consistent: 

2x x + x + x =1 1 − 2 3 4 x1 +2x2 x3 +4x4 =2 . x +7x − 4x + 11x = λ 1 2 − 3 4

199. For each of the following matrices, find the rank and bases in the null, column, and row spaces: 

0 4 10 1 14 2 6 8 2 4 8 18 7 6 104 21 9 17 (a) (b)  10 18 40 17   7 6 341  1 7 17 3 35 30 15 20 5         2. Gaussian Elimination 95 2 1 1 1 100 1 4 1 3 1 1 2 13 1 010 2 5 1 1 4 1 3 12− 0 (c)  001 3 6  (d)   (e) . 1 1 1 5  1− 34 2  1 2 3 14 32   −    1 2 3 4  4 31 1  4 5 6 32 77     −     1 1 1 1        200. For each of the following matrices, compute the determinant and the inverse matrix, and an LUP decomposition: 

1 1 1 1 2 100 2 2 3 − 1 1 1 1 3 200 (a) 1 1 0 (b) − − (c) . −  1 1 1 1   1 134  " 1 2 1 # − − − 1 1 1 1 2 1 2 3  − −   −      1 1 1 201. Prove that D− (d)= D (d− ) and L− (α)= L ( α). i i ij ij − 202. Prove that the inverse of a permutation matrix P is P t. 203. Prove that every invertible matrix M has an LUP decomposition M = LUP where L is lower triangular, U upper triangular, and P is a permutation matrix, and compute such factorizations for the matrices from the previous exercise.  204. Prove that every invertible matrix M has an PLU decomposition M = PLU.  205. Prove that every invertible matrix has factorizations of the form UPL, PUL, and ULP , where L, U, and P stand for lower triangular, upper triangular, and permutation matrices respectively.  206. List all coordinate complete flags in K3. 207. For each permutation matrix P of size 4 4, describe all upper tri- × angular matrices U which can occur as a result of the modified Gaussian algorithm from the proof of the LPU decomposition theorem. For each permutation, find the maximal number of non-zero entries of U. 208.⋆ Compute the dimension of each Bruhat cell, i.e. the number of parameters on which flags in the equivalence class of Fσ depend.  209.⋆ When K is a finite field of q elements, find the number of all complete flags in Kn.  210.⋆ Prove that the number of Bruhat cells of dimension l is equal to the coefficient at ql in the product (called q-factorial)  2 2 n 1 [n] ! := (1 + q)(1 + q + q ) (1 + q + q + + q − ). q · · · · · · 96 Chapter 3. SIMPLE PROBLEMS 3. The Inertia Theorem 97 3 The Inertia Theorem

We study here the classification of quadratic forms and some general- izations of this problem. The answer actually depends on properties of the field of scalars K. Our first goal will be to examine the case K = R. We begin however with a key argument that remains valid in general.

Orthogonal Bases

We will assume here that the field K does not contain Z2, i.e. that 1 + 1 = 0 in K. Then, for any K-vector space, there is a one-to- one correspondence6 between symmetric bilinear forms and quadratic forms: 1 Q(x)= Q(x, x), Q(x, y)= [Q(x + y) Q(x) Q(y)]. 2 − −

Let dim = n, f1,..., fn be a basis of , and (x1,...,xn) correspondingV coordinates.{ In these} coordinates,V the quadratic form Q is written as: n n Q(x)= qijxixj. Xi=1 Xj=1 The coefficients qij here are the values Q(fi, fj) of the corresponding symmetric bilinear form:

n n Q(x, y)= xiqijyj. Xi=1 Xj=1 These coefficients form a matrix, also denoted Q, which is symmetric: q = q for all i, j = 1,...,n. The basis f ,..., f is called Q- ij ji { 1 n} orthogonal if Q(fi, fj) = 0 for all i = j, i.e. if the coefficient matrix is diagonal. 6 Lemma. Every quadratic form in Kn has an orthogonal basis. Proof. We use induction on n. For n = 1 the requirement is empty. Let us construct a Q-orthogonal basis in Kn assuming that every quadratic form in Kn−1 has an orthogonal basis. If Q is the identically zero quadratic form, then the corresponding sym- metric bilinear form is identically zero too, and so any basis will be 98 Chapter 3. SIMPLE PROBLEMS Q-orthogonal. If the quadratic form Q is not identically zero, then there exists a vector f1 such that Q(f1) = 0. Let be the sub- n 6 V space in K consisting of all vectors Q-orthogonal to f1: = x Kn Q(f , x) = 0. . This subspace does not contain f andV is given{ ∈ | 1 } 1 by 1 linear equation. Thus dim = n 1. Let f2,..., fn be a basis in orthogonal with respectV to the− symmetric{ bilinear} form obtainedV by restricting Q to this subspace. Such a basis exists by the induction hypothesis. Therefore Q(fi, fj) = 0 for all 1 1, since fi . Thus f1, f2,..., fn is a Q-orthogonal basis of Kn.  ∈V { } Corollary. For every symmetric n n-matrix Q with en- tries from K there exists an invertible× matrix C such that CtQC is diagonal.

The diagonal entries here are the values Q(f1),...,Q(fn).

Inertia Indices Consider the case K = R. Given a quadratic form Q in Rn, we pick a Q-orthogonal ba- sis f ,..., f and then rescale those of the basis vectors for which { 1 n} Q(f ) = 0: f ˜f = Q(f ) −1/2f . After such rescaling, the non-zero i 6 i 7→ i | i | i coefficients Q(˜fi) of the quadratic form will become 1. Reorder- ing the basis so that terms with positive coefficients come± first, and negative next, we transform Q to the normal form:

Q = X2 + + X2 X2 X2 , p + q n. 1 · · · p − p+1 −···− p+q ≤ Note that by restricting Q to the subspace Xp+1 = = Xn =0 of dimension p we obtain a quadratic form on this subspace· · · which is positive (or positive definite), i.e takes on positive values every- where outside the origin. Proposition. The numbers p and q of positive and neg- ative squares in the normal form are equal to the maximal dimensions of the subspaces in Rn where the quadratic form Q (respectively, Q) is positive. − 2 2 2 2 Proof. The quadratic form Q = X1 +...+Xp Xp+1 ... Xp+q is non-positive everywhere on the subspace of dimension− − n− p given W − by the equations X1 = ... = Xp = 0. Let us show that the existence of a subspace of dimension p + 1 where the quadratic form is positive leads to aV contradiction. Indeed, the subspaces and would V W 3. The Inertia Theorem 99 intersect in a subspace of dimension at least (p+1)+(n p) n = 1, containing therefore non-zero vectors x with Q(x) > 0 and− Q−(x) 0. Thus, Q is positive on some subspace of dimension p and cannot≤ be positive on any subspace of dimension >p. Likewise, Q is positive on some subspace of dimension q and cannot be positive− on any subspace of dimension >q.  The maximal dimensions of positive subspaces of Q and Q are called respectively positive and negative inertia indices−of a quadratic form in question. By definition, inertia indices of a quadratic form do not depend on the choice of a coordinate system. Our Proposition implies that the normal forms with different pairs of values of p and q are pairwise non-equivalent. This establishes the Inertia Theorem (as stated in Section 4 of Chapter 1). Theorem. Every quadratic form in Rn by a linear change of coordinates can be transformed to exactly one of the nor- mal forms:

X2 + ... + X2 X2 ... X2 , where 0 p + q n. 1 p − p+1 − − p+q ≤ ≤

The matrix formulation of the Inertia Theorem reads: Every real symmetric matrix Q can be transformed to exactly one Ip 0 0 of the diagonal forms 0 Iq 0 by transformations of the form  0− 0 0  Q CtQC defined by invertible real matrices C. 7→ Complex Quadratic Forms Consider the case K = C. Theorem. Every quadratic form in Cn can be transformed by linear changes of coordinates to exactly one of the nor- mal forms:

z2 + + z2, where 0 r n. 1 · · · r ≤ ≤

Proof. Given a quadratic form Q, pick a Q-orthogonal basis in Cn, order it in such a way that vectors f ,..., f with Q(f ) = 0 come 1 r i 6 first, and then rescale these vectors by f Q(f )f . i 7→ i i p 100 Chapter 3. SIMPLE PROBLEMS In particular, we have proved that every complex symmetric ma- I 0 trix Q can be transformed to exactly one of the forms r by 0 0   the transformations of the form Q CtQC defined by invertible complex matrices C. As it follows7→ from the Rank Theorem, here r = rk Q, the rank of the coefficient matrix of the quadratic form. This guarantees that the normal forms with different values of r are pairwise non-equivalent, and thus completes the proof.  To establish the geometrical meaning of r, consider a more general situation. Given a quadratic form Q on a K-vector space , its kernel is defined as the subspace of consisting of all vectorsV which are Q-orthogonal to all vectors fromV : V Ker Q := z Q(z, v) = 0 for all v . { ∈ V| ∈V } Note that the values Q(x, y) do not change when a vector from the kernel is added to either of x and y. As a result, the symmetric bilinear form Q descends to the quotient space / Ker Q. V The rank of a quadratic form Q on Kn is defined as the codimen- sion of Ker Q. For example, the quadratic form z2+ +z2 on Kn cor- 1 · · · r responds to the symmetric bilinear form x1y1 + +xryr, and has the kernel of codimension r defined by the equations· · · z = = z = 0. 1 · · · r

Conics The set of all solutions to one polynomial equation in Kn:

F (x1,...,xn) = 0 is called a hypersurface. When the polynomial F does not depend on one of the variables (say, xn), the equation F (x1,...,xn−1) = 0 defines a hypersurface in Kn−1. Then solution set in Kn is called a cylinder, since it is the Cartesian product of the hypersurface in n−1 K and the line of arbitrary values of xn. Hypersurfaces defined by polynomial equations of degree 2 are often referred to as conics — a name reminiscent of conic sections, which are “hypersurfaces” in K2. The following application of the Inertia Theorem allows one to classify all conics in Rn up to equiv- alence defined by compositions of translations with invertible linear transformations. 3. The Inertia Theorem 101 Theorem. Every conic in Rn is equivalent to either the cylinder over a conic in Rn−1, or to one of the conics: x2 + + x2 x2 x2 = 1, 0 p n, 1 · · · p − p+1 −···− n ≤ ≤ x2 + + x2 = x2 + + x2 , 0 p n/2, 1 · · · p p+1 · · · n ≤ ≤ x = x2 + + x2 x2 x2 , 0 p (n 1)/2, n 1 · · · p − p+1 −···− n−1 ≤ ≤ − known as hyperboloids, cones, and paraboloids respectively. For n = 3, all types of “hyperboloids” (of which the first type contains spheres and ellipsoids) are shown in Figures 36, and cones and paraboloids in Figure 37.

p=3 p=2 p=1 p=0 Figure 36

p=0 p=1 p=0 p=1 Figure 37

Proof. Given a degree 2 polynomial F = Q(x)+ a(x)+ c, where Q is a non-zero quadratic form, a a linear form, and c a constant. we can apply a linear change of coordinates to transform Q to the form x2 x2, where r n, and then use the completion ± 1 ±···± r ≤ of squares in the variables x1,...,xr to make the remaining linear form independent of x1,...,xr. When r = n, the resulting equations 2 2 x1 xn = C (where C is a new constant) define hyperboloids (when± ±···±C = 0), or cones (when C = 0). When r

z2 + + z2 = 1, z2 + + z2 = 0, z = z2 + + z2 . 1 · · · n 1 · · · n n 1 · · · n−1 Example. Let Q be a non-degenerate quadratic form with real coefficients in 3 variables. According to the previous (real) classifi- cation theorem, the conic Q(x1,x2,x3) = 1 can be transformed by a real change of coordinates into one of the 4 normal forms shown on Figure 36. The same real change of coordinates identifies the set of complex solutions to the equation Q(z1, z2, z3) = 1 with that of the normal form: z2 z2 + z2 = 1. However, z becomes z after the ± 1 ± 2 ± 3 − change z √ 1z, which identifies the set of complex solutions with 7→ − 3 2 2 2 the complex sphere in C , given by the equation z1 + z2 + z3 = 1. Thus, various complex conics equivalent to the complex sphere and given by equations with real coefficients, “expose” themselves in R3 by various real forms: real spheres or ellipsoids, hyperboloids of one or two sheets (as shown on figure 36), or even remain invisible (when the set of real points is empty). Remark. The same holds true in general: various hyperboloids (as well as cones or paraboloids) of the real classification theorem are real forms of complex conics defined by the same equations. They become equivalent when complex changes of coordinates are allowed. In this sense, the three normal forms of the last theorem represent hyperboloids, cones and paraboloids of the previous one.

Hermitian and Anti-Hermitian Forms We introduce here a variant of the notion of a quadratic form which makes sense in complex vector spaces. It has no direct analogues over arbitrary fields, but it plays a central role in geometry and mathematical physics. Let be a C-vector space. A function6 P : C is called a sesquilinearV form if it is linear in the secondV×V→ argument, i.e.

P (z, λx+µy)= λP (z, x)+µP (z, y) for all λ, µ C and x, y, z , ∈ ∈V 6We leave it to the reader to examine the possibility when the function P is defined on the product V × W of two different spaces. 3. The Inertia Theorem 103 and semilinear (or antilinear) in the first one: P (λx+µy, z)= λP¯ (x, z)+¯µP (y, z) for all λ, µ C and x, y, z . ∈ ∈V When P is sesquilinear, the form P ∗ defined by P ∗(x, y) := P (y, x) is also sesquilinear, and is called Hermitian adjoint7 to P . When P ∗ = P , i.e. P (y, x)= P (x, y) for all x, y , ∈V the form P is called Hermitian symmetric. When P ∗ = P , i.e. − P (y, x)= P (x, y) for all x, y , − ∈V the form P is called Hermitian anti-symmetric. When P is Her- mitian symmetric, iP is Hermitian anti-symmetric (and vice versa) since ¯i = i. Every sesquilinear form is uniquely written as the sum of an Hermitian− symmetric and Hermitian anti-symmetric form: 1 1 P = H + Q, H = (P + P ∗), Q = (P P ∗). 2 2 − In coordinates, when x = xiei and y = yjej, we have m n P P P (x, y)= x¯ipijyj, where pij = P (ei, ej). Xi=1 Xj=1 The coefficients pij of a sesquilinear form can be arbitrary complex numbers, and form a square matrix which we also denote by P . The Hermitian adjoint form has the coefficient matrix, denoted P ∗, whose ∗ entries are pij = P (ej, ei)= pji. Thus two complex square matrices, P and P ∗, are Hermitian adjoint if they are obtained from each other by transposition and entry-wise complex conjugation: P ∗ = P¯t. A complex square matrix P is called Hermitian if P ∗ = P , and anti-Hermitian if P ∗ = P . − To any Hermitian symmetric sesquilinear form on a C-vector space , there corresponds an Hermitian quadratic form, or sim- ply HermitianV form, z H(z, z). Following our abuse-of-notation convention, we denote this7→ form by the same letter H, i.e. put H(z) := H(z, z) for all z . ∈V The values H(z) of an Hermitian form are real: H(z, z)= H(z, z).

7After French mathematician Charles Hermite (1822–1901). 104 Chapter 3. SIMPLE PROBLEMS

In coordinates z = ziei, an Hermitian form is written as P n n H(z)= z¯ihijzj, Xi=1 Xj=1 where hij = hji. For example, the Hermitian form

z 2 = z 2 + + z 2, | | | 1| · · · | n| corresponds to the sesquilinear form

x, y =x ¯ y + +x ¯ y , h i 1 1 · · · n n which plays the role of the Hermitian dot product in Cn. Note that z 2 > 0 unless z = 0. An Hermitian form on a vector space is called| | positive definite (or simply positive) if all of its valuesV outside the origin are positive. Since H(z, w)+ H(w, z)=2Re H(z, w), we have:

1 Re H(z, w)= [H(z + w) H(z) H(w)] , 2 − − 1 Im H(z, w)=Re H(iz, w)= [H(iz + w) H(iz) H(w)] , 2 − − i.e. a Hermitian symmetric sesquilinear form is uniquely determined by the corresponding Hermitian quadratic form. Theorem. Every Hermitian form H in Cn can be trans- formed by a C-linear change of coordinates to exactly one of the normal forms

z 2 + + z 2 z 2 z 2, 0 p + q n. | 1| · · · | p| − | p+1| −···−| p+q| ≤ ≤ Proof. It is the same as in the case of the Inertia Theorem for real quadratic forms. We pick a vector f1 such that H(f1) = 1, and consider the subspace consisting of all vectors H-orthogonal± V1 to f1. It does not contain f1 (since H(f1, f1) = H(f1) = 0), and has codimension 1. We consider the Hermitian form obtained6 by restricting H to 1 and proceed the same way, i.e. pick a vector f such that HV (f )= 1, and pass to the subspace consisting 2 ∈V1 2 ± V2 of all vectors of 1 which are H-orthogonal to f2. The process stops V n when we reach a subspace r of codimension r in C such that the restriction of the form H toV vanishes identically. Then we pick Vr 3. The Inertia Theorem 105 any basis fr+1,..., fn in r. The vectors f1,..., fn form a basis n { } V in C which is H-orthogonal (since H(fi, fj) = 0 for all i < j by construction), and H(f , f ) = 1 (for i r) or = 0 for i > r. i i ± ≤ Reordering the vectors f1,..., fr so that those with the values +1 come first, we obtain the required normal form for H, where p+q = r. To prove that the normal forms with different pairs of values of p and q are non-equivalent to each other, we show (the same way as in the case of real quadratic forms) that the number p (q) of positive (respectively negative) squares in the normal form is equal to the maximal dimension of a subspace where the Hermitian form H (respectively H) is positive definite. . − To any given Hermitian anti-symmetric sesquilinear form Q, there corresponds an anti-Hermitian form Q(z) := Q(z, z), which takes on purely imaginary values, and determines the given sesquilinear form uniquely. Applying the theorem to the Hermitian form iQ, we obtain the following result. Corollary 1. An anti-Hermitian form Q in Cn can be transformed by a C-linear change of coordinates to exactly one of the normal forms i z 2 + + i z 2 i z 2 i z 2, 0 p + q n. | 1| · · · | p| − | p+1| −···− | p+q| ≤ ≤ Using matrix notation, we can express the Hermitian dot product by x∗y (where x∗ = x¯t), and respectively the values of an arbitrary sesquilinear form by P (x, y) = x∗P y. Making a linear change of variables x = Cx′, y = Cy′, we find x∗P y = (x′)∗P ′y′, where P ′ is the coefficient matrix of the same form in the new coordinates. The ′ ∗ value pij = P (Cei,Cej) is the product of the ith row of C with P and the jth column of C. Thus P ′ = C∗PC. Therefore the previous results have the following matrix reformulations. Corollary 2. Any Hermitian (anti-Hermitian) matrix can be transformed to exactly one of the normal forms

Ip 0 0 iIp 0 0 0 Iq 0 respectively 0 iIq 0 .  0− 0 0    0− 0 0         by transformations of the form P C∗PC defined by in- vertible complex matrices C. 7→ It follows that p + q is equal to the rank of the coefficient matrix of the (anti-)Hermitian form. 106 Chapter 3. SIMPLE PROBLEMS Sylvester’s Rule Let H he a Hermitian n n-matrix. Denote by ∆ = 1, ∆ = h , × 0 1 11 ∆2 = h11h22 h12h21, ..., ∆n = det H the minors formed by the intersection of− the first k rows and columns of H, k = 1, 2,...,n (Figure 38). They are called leading minors of the matrix H. Note that det H = det Ht = det H¯ = det H is real, and the same is true for each ∆k, since it is the determinant of an Hermitian k k- matrix. The following result is due to the English mathemati×cian James Sylvester (1814–1897).

∆ h h h 1 11 12 1n ∆ h h 2 21 22 ∆ 3

∆ n−1 ∆ h h n n1 nn

Figure 38

Theorem. Suppose that an Hermitian n n-matrix H has non-zero leading minors. Then the negative× inertia index of the corresponding Hermitian form is equal to the number of sign changes in the sequence ∆0, ∆1,..., ∆n. Remark. The hypothesis that det H = 0 means that the Her- mitian form is non-degenerate, or equivalently,6 that its kernel is trivial. In other words, for each non-zero vector x there exists y such that H(x, y) = 0. Respectively, the assumption that all leading minors are non-zero6 means that restrictions of the Hermitian forms to all spaces of the standard coordinate flag

Span(e ) Span(e , e ) Span(e ,..., e ) . . . 1 ⊂ 1 2 ⊂···⊂ 1 k ⊂ are non-degenerate. The proof of the theorem consists in classify- ing such Hermitian forms up to linear changes of coordinates that preserve the flag. Proof. As before, we inductively construct an H-orthogonal basis f1,..., fn and normalize the vectors so that H(fi) = 1, re- quiring{ however} that each f Span(e ,..., e ). When such± vectors k ∈ 1 k f1,..., fk−1 are already found, the vector fk, H-orthogonal to them, can be found (by the Rank Theorem) in the k-dimensional space of 3. The Inertia Theorem 107 the flag, and can be assumed to satisfy H(fk) = 1, since the Her- mitian form on this space is non-degenerate. Thus,± an Hermitian form non-degenerate on each space of the standard coordi- nate flag can be transformed to one (and in fact exactly one) n 2 2 of the 2 normal forms z1 zn by a linear change of coordinates preserving±| the| ±···±| flag. | In matrix form, this means that there exists an invertible upper triangular matrix C such that D = C∗HC is diagonal with all di- agonal entries equal to 1. Note that transformations of the form H C∗HC may change± the determinant but preserve its sign: 7→ det(C∗HC) = (det C∗)(det H)(det C) = det H det C 2. | | When C is upper triangular, the same holds true for all leading minors, i.e. each ∆ has the same sign as the leading k k-minor k × of the diagonal matrix D with the diagonal entries d1,...,dn equal 1. The latter minors form the sequence 1, d , d d ,...,d ...d ,... , ± 1 1 2 1 k where the sign is changed each time as dk = 1. Thus the total number of sign changes is equal to the number− of negative squares in the normal form. 

When the form H is positive definite, its restrictions to any sub- space is positive definite and hence non-degenerate automatically. We obtain the following corollaries.

Corollary 1. Any positive definite Hermitian form in Cn 2 2 can be transformed into z1 + + zn by a linear change of coordinates preserving| a| given· · · complete| | flag. Corollary 2. A Hermitian form in Cn is positive definite if and only if all of its leading minors are positive. Note that the standard basis of Cn is orthonormal with respect to the Hermitian dot product x, y = x¯ y , i.e. e , e = 0 for h i i i h i ji i = i, and ei, ei = 1. 6 h i P Corollary 3. Every Hermitian form in Cn has an or- thonormal basis f ,..., f such that f Span(e ,..., e ). { 1 n} k ∈ 1 k Remarks. (1) The process of replacing a given basis e1,..., en with a new basis, orthonormal with respect to a given positiv{ e defi-} nite Hermitian form and such that each fk is a linear combination of e1,..., ek, is called Gram–Schmidt orthogonalization. (2) Results of this subsection hold true for quadratic forms in Rn. Namely, our reasoning can be easily adjusted to this case. Note also that every real symmetric matrix is Hermitian. 108 Chapter 3. SIMPLE PROBLEMS Finite Fields

Consider the case K = Zp, the field of integers modulo a prime number p = 2. It consists of p elements corresponding to possible remainders6 0, 1, 2,...,p 1 of integers divided by p. It is indeed a field due to some facts− of elementary number theory. Namely, when an integer a is not divisible by p, it follows from the Euclidean algorithm, that the greatest common divisor of a and p, which is 1, can be represented as their linear combination: 1 = ma+np. Modulo p, this means that m is inverse to a. Thus every non-zero element of Zp is invertible. As we have seen, in classification of quadratic forms, it is impor- tant to know which scalars are complete squares. Examples. (1) Let Q = ax2 and Q′ = a′x2 be two non-zero quadratic forms on K1 (i.e. a, a′ K 0 ). Rescaling x to cx, where c can be any element from K∈ 0−, { transforms} Q into ac2x2. Thus, the quadratic forms in K1 are equivalent− { } if and only if a′ = ac2 for some non-zero c. i.e. if the ratio a′/a is a complete square in K 0 . − { } (2) When K = C, every element is a complete square, so there is only one equivalence class of non-zero quadratic forms in C1. When K = R, there are two such classes according to the sign of the coef- ficient (because complete squares are exactly the positive reals). (3) When K = Zp, there are p 1 non-zero quadratic forms which are divided into two equivalence classes.− One class can be represented by the normal form x2 and consists of those quadratic forms whose coefficient is a complete square, a = c2 = 0. There are (p 1)/2 such forms, i.e. a half of all non-zero ones,6 since each complete− square a = c2 has exactly two different square roots: c and c. Let ε = 0 be any non-square. Then, when c2 runs all squares, εc−2 runs all6 the (p 1)/2 non-squares. Thus Q = εx2 can be taken for the normal form− in the other equivalence class. (4) In Z13, there are 12 non-zero elements represented by the integers 1,..., 6, their squares are 1, 4, 4, 3, 1, 3 respectively, and the± non-squares± are 2, 5, 6. Any− of them− − (e.g. 2) can be ± ± ± 1 taken for ε. Thus, every non-zero quadratic form on Z13 is equivalent to either x2 or 2x2. This example suggests that there may be no 2 choice of the normal form εx good for all Zp at once. n Theorem. Every non-zero quadratic form on Zp , p = 2, is equivalent to exactly one of the forms 6 x2 + x2 + + x2, εx2 + x2 + + x2, 1 r n. 1 2 · · · r 1 2 · · · r ≤ ≤ 3. The Inertia Theorem 109 Proof. First note that both normal forms have rank r. Since the rank of the coefficient matrix Q of a quadratic form does not change under the transformations CtQC defined by invertible matrices C, it suffices to prove that a quadratic form of a fixed rank r > 0 is equivalent to exactly one of the two normal forms. Next, the symmetric bilinear form corresponding to the quadratic form Q of rank r has kernel Ker(Q) (non-trivial when r 1. Let us begin with the case r = 2. Lemma. Given non-zero a, b Z , there exist (x,y) Z2 ∈ p ∈ p such that ax2 + by2 = 1. Indeed, when each of x and y runs all p possible values (including 0) each of ax2 and 1 by2 takes on (p 1)/2 + 1 different values. Since the total number− exceeds p, we must− have ax2 = 1 by2 for some x and y.  −

Thus, given a non-degenerate quadratic form P = ax2 + by2 in 2 2 Zp, there exists f Zp, such that P (f) = 1. Taking a second vector P -orthogonal to f∈, we obtain a new basis in which P takes on the form a′x2 + b′y2 with a′ = 1 and b′ = 0. 6 We can apply this trick r 1 times to the quadratic form Q = 2 2 − a1x1 + + arxr using two of the variables at a time, and end up with the· · form · where a = a = = a = 1. Finally, rescaling x r r−1 · · · 2 1 as in Example 3, we can make a1 equal either 1 or ε. Remark. Readers comfortable with arbitrary finite fields can eas- ily check that our proof and the theorem remain true over any finite field K Z , p = 2, with any non-square in K taken in the role of ε. ⊃ p 6 110 Chapter 3. SIMPLE PROBLEMS

The Case of K = Z2. This is a peculiar world where 2 = 0, 1 = 1, and where therefore the usual one-to-one correspondence between− quadratic and symmetric bilinear forms is broken, and the distinction between symmetric and anti-symmetric forms lost. n Yet, consider a symmetric bilinear form Q on Z2 : n n Q(x, y)= xiqijyj, where qij = qji = 0 or 1 for all i, j. Xi=1 Xj=1 The corresponding quadratic form Q(x) := Q(x, x) still exists, but satisfies Q(x + y) = Q(x) + 2Q(x, y)+ Q(y) = Q(x)+ Q(y) and n hence defines a linear function Z2 Z2. This linear function can be identically zero, i.e. Q(x, x) = 0 for→ all x, in which case the bilinear form Q is called even. This happens exactly when all the diago- nal entries of the coefficient matrix vanish: qii = Q(ei, ei) = 0 for all i. Otherwise the bilinear form Q is called odd. We begin with classifying even non-degenerate forms. As it is implied by the follow- ing theorem, such forms exist only in Z2-spaces of even dimension n = 2k. Theorem. Every even non-degenerate symmetric bilinear n form Q on Z2 in a suitable coordinate system is given by the formula: Q(x, y)= x y + x y + + x y + x y . (i) 1 2 2 1 · · · 2k−1 2k 2k 2k−1 Proof. Pick any f = 0 and find f such that Q(f , f ) = 1. 1 6 2 1 2 Such f2 must exist since Q is non-degenerate. In Span(f1, f2), we have: Q(x1f1 + x2f2,y1f1 + y2f2)= x1y2 + x2y1, since Q is even, i.e. Q(f1, f1)= Q(f2, f2) = 0. Let denote the space of all vectors Q-orthogonal to Span(f , f ). V 1 2 It is given by two linear equations: Q(f1, x) = 0, Q(f2, x) = 0, which are independent (since x = f1 satisfies the first one but not the second, and x = f2 the other way around). Therefore codim = 2. If v is Q-orthogonal to all vectors from , then beingV Q- ∈ V V orthogonal to f1 and f2, it lies in Ker Q, which is trivial. This shows that the restriction of the bilinear form Q to is non-degenerate. V We can continue our constriction inductively, i.e. find f3, f4 such that Q(f , f ) = 1, take their Q-orthogonal complement in∈V, and 3 4 V so on. At the end we obtain a basis f1, f2,..., f2k−1, f2k such that Q(f2i−1, f2i)=1= Q(f2i, f2i−1) for i = 1,...,k, and Q(fi, fj)=0 for all other pairs of indices. In the coordinate system corresponding to this basis, the form Q is given by (i).  3. The Inertia Theorem 111 Whenever Q is given by the formula (i), let us call the basis a Darboux basis8 of Q. n Consider now the case of odd non-degenerate forms. Let Z2 denote the subspace given by one linear equation Q(x) = 0.W It ⊂ has dimension n 1. The restriction to it of the bilinear form Q is even, but possibly− degenerate. Consider vectors y Q-orthogonal to all vectors from . They are given by the system of n 1 linear equations: Q(w W, y) = = Q(w , y) = 0, where w ,...,− w 1 · · · n−1 1 n−1 is any basis of . Since Q is non-degenerate, and wi are linearly independent, theseW linear equations are also independent, and hence the solution space has dimension 1. Let f0 be the non-zero solution vector, i.e. f0 = 0, and Q(f0, x) = 0 for all x . There are two cases: f (Figure6 39) and f / (Figure 40).∈ W 0 ∈W 0 ∈W f f 0

W W f 0

V

Figure 39 Figure 40

In the first case, f0 spans the kernel of the form Q restricted to . We pick f such that Q(f, f0) = 1. Such f exists (since Q is non- Wdegenerate), but f / , i.e. Q(f) = 1. Let consist of all vectors of which are Q-orthogonal∈ W to f. It is a subspaceV of codimension 1 in W, which does not contain f . Therefore the restriction of Q to W 0 V is non-degenerate (and even). Let f1,..., f2k (with 2k = n 2) be { } −n a Darboux basis in . Then f, f0, f1,..., f2k form a basis in Z2 such that in the correspondingV coordinate system: k Q = xy + xy0 + x0y + (x2i−1y2i + x2iy2i−1). (ii) Xi=1 In the second case, Q(f0, f0) = 1, so that the restriction of Q to is non-degenerate (and even). Let f1,..., f2k (with 2k = n 1) W { } n − be a Darboux basis in . Then f0, f1,..., f2k form a basis in Z2 such that in the correspondingW coordinate system: k Q = x0y0 + (x2i−1y2i + x2iy2i−1). (iii) Xi=1 8After a French mathematician Jean-Gaston Darboux (1842–1917). 112 Chapter 3. SIMPLE PROBLEMS Corollary. A non-degenerate symmetric bilinear form in Zn is equivalent to (iii) when n is odd, and to one of the forms (i) or (ii) when n is even.

The Case of K = Q All previous problems of this section belong to Linear Algebra, which is Geometry, and hence are relatively easy. Classification of rational quadratic forms belongs to Arithmetic and is therefore much harder. Here we can only hope to whet reader’s appetite for the theory which is one of the pinnacles of classical Number Theory, and refer to [5] for a serious introduction. Of course, every non-degenerate quadratic form Q on Qn has a Q-orthogonal basis and hence can be written as

Q = a x2 + + a x2 , 1 1 · · · n n where a1,...,an are non-zero rational numbers. Furthermore, by rescaling xi we can make each ai integer and square free (i.e. ex- pressed as a signed product p1 ...pk of distinct primes). The prob- lem is that such quadratic forms± with different sets of coefficients can sometimes be transformed into each other by transformations mixing up the variables, and it is not obvious how to determine if this is the case. A necessary condition is that the inertia indices of equivalent rational forms must be the same, since such forms are equivalent over R. However, there are many other requirements. To describe them, let us start with writing integers and fractions using the binary number system, e.g.: 1 2009 = 11111011001 , = .010101 . . . . (10) (2) −3 − (2) We usually learn in school that every rational (and even real) number can be represented by binary sequences, which are either finite or infinite to the right. What we usually don’t learn in school is that rational numbers can also be represented by binary sequences infinite to the left. For instance, 1 1 = =1+22 + 24 + 26 + = . . . 1010101. −3 1 22 · · ·(10) (2) − For this, one should postulate that powers 2k of the base become smaller (!) as k increases, and moreover: lim 2k = 0 as k + . → ∞ 3. The Inertia Theorem 113 Just as the standard algorithms for the addition and multiplication of finite binary fractions can be extend to binary fractions infinite to the right, they can be extended to such fractions infinite to the left. While the former possibility leads to completing the field Q into R, the latter one gives rise to another completion, denoted Q(2). In fact the same construction can be repeated with any prime base p each time leading to a different completion, Q(p), called the field of p-adic numbers. If two quadratic forms with rational coefficients are equivalent over Q they must be equivalent not only over R (denoted in this context by Q(∞)), but also over Q(p) for each p = 2, 3, 5, 7, 11,... . Classification of quadratic forms over each Q(p) is relatively tame. For instance, it can be shown that over Q(2), there are 16 (respec- tively 15 and 8) equivalence classes of quadratic forms of rank r when r > 2 (respectively r = 2 and r = 1). However, the classification of quadratic forms over Q is most concisely described by the following celebrated theorem.9 Theorem. Two quadratic forms with rational coefficients are equivalent over Q if and only if they are equivalent over each Q , p = 2, 3, 5, 7,..., . (p) ∞ These infinitely many equivalence conditions are not indepen- dent. Remarkably, if all but any one of them are satisfied, then the last one is satisfied too. It follows, for example, that if two rational quadratic forms are equivalent over every p-adic field, then they are equivalent over R.

EXERCISES 211. Find orthogonal bases and inertia indices of quadratic forms: 

x x + x2, x2 +4x x +6x2 12x x + 18x2, x x + x x + x x . 1 2 2 1 1 2 2 − 2 3 3 1 2 2 3 3 1

212. Prove that Q = 1 i j n xixj is positive definite. ≤ ≤ ≤ 213. A minor of a square matrix formed by rows and columns with the P same indices is called principal. Prove that all principal minors of the coefficient matrix of a positive definite quadratic form are positive. ⋆ n 214. Let a1,..., ap and b1,..., bq be linear forms in R , and let Q(x)= a2(x)+ + a2(x) b2(x) b2(x). Prove that the positive and 1 · · · p − 1 −···− q negative inertia indices of Q do not exceed p and q respectively. 

9This is essentially a special case of the Minkowski–Hasse theorem named after Hermann Minkowski (1864–1909) and Helmut Hasse (1898–1979). 114 Chapter 3. SIMPLE PROBLEMS 215. Find the place of surfaces x x +x x = 1 and x x +x x +x x = 1 2 2 3 ± 1 2 2 3 3 1 1 in the classification of conics in R3. ± 216. Examine normal forms of hyperboloids in R4 and find out how many connected components (“sheets”) each of them has.  217. Find explicitly a C-linear transformation that identifies the sets of complex solutions to the equations xy = 1 and x2 + y2 = 1. 218. Find the rank of the quadratic form z2 +2iz z z2. 1 1 2 − 2 219. Define the kernel of an anti-symmetric bilinear form A on a vector space as the subspace Ker A := z A(z, x) = 0 for all x , and V { ∈ V | ∈ V} prove that the form descends to the quotient space / Ker A. V 220. Classify conics in C2 up to linear inhomogeneous transformations.  221. Find the place of the complex conic z2 2iz z z2 = iz + z in the 1 − 1 2 − 2 1 2 classification of conics in C2.  222. Classify all conics in C3 up to linear inhomogeneous transformations. 223. Prove that there are 3n 1 equivalence classes of conics in Cn. − t 224. Check that P ∗ = P if and only if A is real. 225. Show that diagonal entries of an Hermitian matrix are real, and of anti-Hermitian imaginary. 226. Find all complex matrices which are symmetric and anti-Hermitian simultaneously.  227.⋆ Prove that a sesquilinear form P of z, w Cn can be expressed in ∈ terms of its values at z = w, and find such an expression.  228. Define sesquilinear forms P : Cm Cn C of pairs of vectors (z, w) × → taken from two different spaces, and prove that P (z, w)= z, P w , where h i P is the m n-matrix of coefficients of the form, and , is the Hermitian × h· ·i dot product in Cm. 

229. Prove that under changes of variables v = Dv′, w = Cw′ the coeffi- cent matrices of sesquilinear forms are transformed as P D∗P C. 7→ 230. Prove that Az, w = z,Bw for all z Cm, w Cn if and only if h i h i ∈ ∈n m A = B∗. Here , denote Hermitian dot products in C or C .  h· ·i 231. Prove that (AB)∗ = B∗A∗.  232. Prove that for (anti-)Hermitian matrices A and B, the commutator matrix AB BA is (anti-)Hermitian. − 233. Find out which of the following forms are Hermitian or anti-Hermitian and transform them to the appropriate normal forms:  z¯ z z¯ z , z¯ z +¯z z , z¯ z + iz¯ z iz¯ z z¯ z . 1 2 − 2 1 1 2 2 1 1 1 2 1 − 1 2 − 2 2 234. Prove that for every symmetric matrix Q all of whose leading minors are non-zero there exists a unipotent upper triangular matrix C such that 3. The Inertia Theorem 115 D = CtQC is diagonal, and express the diagonal entries of D in terms of the leading minors.  235. Use Sylvester’s rule to find inertia indices of quadratic forms:  x2 +2x x +2x x +2x x , x x x2 + x2 +2x x + x2. 1 1 2 2 3 1 4 1 2 − 2 3 2 4 4 236. Compute determinants and inertia indices of quadratic forms: 2 2 2 2 2 x1 x1x2 + x2, x1 + x2 + x3 x1x2 x2x3. − −n 2 − 237. Prove positivity of the quadratic form i=1 xi 1 i

239. In Z11, compute multiplicative inverses of all non-zero elements, find all non-square elements, and find out if any of the quadratic forms x1x2, x2 + x x +3x2, 2x2 + x x 2x2 are equivalent to each other in Z2 . 1 1 2 2 1 1 2 − 2 11 240. Prove that when p is a prime of the form 4k 1, then every non- n − degenerate quadratic form in Zp is equivalent to one of the two normal 2 2 2 forms x1 + x2 + + xn.  ± · · · 3 241. Prove that in a sutable coordinate system (u,v,w) in the space Zp of a b symmetric 2 2-matrices over Z , the determinant ac b2 takes × b c p −   on the form u2 + v2 + w2. 242. For x Zn, show that a x2 = a x . ∈ 2 i i i i 243. Show that on Z2, there are 4 non-degenerate symmetric bilinear forms, 2 P P and find how they are divided into 2 equivalence classes. 

244. Let Q be a quadratic form in n variables x1,...,xn over Z2, i.e. a sum of monomials x x . Associate to it a function of x, y Zn given by i j ∈ 2 the formula: BQ(x, y) = Q(x + y)+ Q(x)+ Q(y). Prove that BQ is an even bilinear form. 245.⋆ Let and denote vector Z -spaces of quadratic and symmetric Q B 2 bilinear forms on Zn respectively. Denote by p : the linear map 2 Q→B Q B , and by q : the linear map that associates to a symmetric 7→ Q B→Q bilinear form B the quadratic form Q(x)= B(x, x). Prove that the range of p consists of all even forms and thus coincides with the kernel of q, and that the range of q in its turn coincides with the kernel of p, i.e. consists 2 of all “diagonal” quadratic forms Q = aixi . 246. Show that in Q , the field of 2-adic numbers, 1= . . . 11111. (2) P ⋆ − 247. Compute the (unsigned!) binary representation of 1/3 in Q(2).  ⋆ 248. Prove that every non-zero 2-adic number is invertible in Q(2).  249.⋆ Prove that a 2-adic unit 1. (where is a wild card) is a ···∗∗∗ ∗ square in Q(2) if and only if it has the form 001. ⋆ ···∗∗∗ 250. Prove that over the field Q(2), there are 8 equivalence classes of quadratic forms in one variable.  116 Chapter 3. SIMPLE PROBLEMS

⋆ 2 2 251. Let K, K×, (K×) and := K×/(K×) stand for: any field, the set V of non-zero elements in it, complete squares in K×, and equivalence classes of all non-zero elements modulo complete squares. Show that , equipped V with the operation, induced by multiplication in K×, is a Z2-vector space. Show that when K = C, R, or Q , dim =0, 1 and 3 respectively. (2) V 252. Let Q and Q′ be quadratic forms in n variables with integer coeffi- cients. Prove that if these forms can be transformed into each other by linear changes of variables with coefficients in Z, then det Q = det Q′. (Thus, det Q, which is called the discriminant of Q, depends only on the equivalence class of Q.) Chapter 4

Eigenvalues

1 The Spectral Theorem

Hermitian Spaces Given a C-vector space , an Hermitian inner product in is defined as a Hermitian symmetricV sesquilinear form such thatVthe corresponding Hermitian quadratic form is positive definite. A space equipped with an Hermitian inner product , is called a Her- Vmitian space.1 h· ·i The inner square z, z is interpreted as the square of the length z of the vector z. Respectively,h i the distance between two points z and| | w in an Hermitian space is defined as z w . Since the Hermitian inner product is positive, distance is well-defined,| − | symmetric, and positive (unless z = w). In fact it satisfies the triangle inequality2:

z w z + w . | − | ≤ | | | | This follows from the Cauchy – Schwarz inequality:

z, w 2 z, z w, w , |h i| ≤ h i h i where the equality holds if and only if z and w are linearly dependent. To derive the triangle inequality, write:

z w 2 = z w, z w = z, z z, w w, z + w, w | − | h − − i h i − h i − h i h i z 2 + 2 z w + w 2 = ( z + w )2. ≤ | | | || | | | | | | | 1Other terms used are unitary space and finite dimensional Hilbert space. 2This makes a Hermitian space a metric space.

117 118 Chapter 4. EIGENVALUES To prove the Cauchy–Schwarz inequality, note that it suffices to consider the case w = 1. Indeed, when z = 0, both sides vanish, and when w = 0, both| | sides scale the same way when w is normalized to the unit length.6 So, assuming w = 1, we put λ := w, z and consider the projection λw of the| vector| z to the line spannedh i by w. The difference z λw is orthogonal to w: w, z λw = w, z λ w, w = 0.− From positivity of inner squares,h we− have:i h i− h i 0 z λw, z λw = z, z λw, z = z, z λ z, w . ≤ h − − i h − i h i− h i Since z, w = w, z = λ¯, we conclude that z 2 z, w 2 as required.h Noticei h thati the equality holds true only| when| ≥z |h= λwi|. All Hermitian spaces of the same dimension are iso- metric (or Hermitian isomorphic), i.e. isomorphic through iso- morphisms respecting Hermitian inner products. Namely, as it fol- lows from the Inertia Theorem for Hermitian forms, every Hermitian space has an orthonormal basis, i.e. a basis e1,..., en such that ei, ej = 0 for i = j and = 1 for i = j. In the coordinate system hcorrespondingi to an6 orthonormal basis, the Hermitian inner product takes on the standard form: z, w =z ¯ w + +z ¯ w . h i 1 1 · · · n n An orthonormal basis is not unique. Moreover, as it follows from the proof of Sylvester’s rule, one can start with any basis f1,..., fn in and then construct an orthonormal basis e ,..., e such that e V 1 n k ∈ Span(f1,..., fk). This is done inductively; namely, when e1,..., ek−1 have already been constructed, one subtracts from fk its projection to the space Span(e1,..., ek−1): ˜f = f e , f e e , f e . k k − h 1 ki 1 −···−h k−1 ki k−1 The resulting vector ˜fk lies in Span(f1,..., fk−1, fk) and is orthogonal to Span(f1,..., fk−1) = Span(e1,..., ek−1). Indeed,

k−1 e , ˜f = e , f e , f e , e = 0. h i ki h i ki− h j kih i ji Xj=1 for all i = 1,...,k 1. To construct ek, one normalizes ˜fk to the unit length: − e := ˜f , ˜f / ˜f . k h k k | k| The above algorithm of replacing a given basis with an orthonormal one is known as Gram–Schmidt orthogonalization. 1. The Spectral Theorem 119 Normal Operators Let A : be a linear map between two Hermitian spaces. Its HermitianV→W adjoint (or simply adjoint is defined as the linear map A∗ : characterized by the property W→V A∗w, v = w, Av for all v , w . h i h i ∈V ∈W One way to look at this construction is to realize that an Hermitian inner product on a vector space allows one to assign to each vector z a C-linear function z, (i.e.U the function whose value at∈Uw is equal to z, wh ).·i Moreover, since theU→U inner product is non-degenerate,∈ U each Ch -lineari function on is represented this way by a unique vector. Thus, given w , oneU introduces a C-linear function on : v w, Av , and defines∈ W A w to be the unique vector in thatV represents7→ h thisi linear function.∗ V Examining the defining identity of the adjoint map in coordinate systems in and corresponding to orthonormal bases, we conclude that the matrixV ofWA∗ in such bases is the Hermitian adjoint to A, i.e. A∗ = A¯t. Indeed, in matrix product notation,

w, Av = w∗Av = (A∗w)∗v = A∗w, v . h i h i Note that (A∗)∗ = A, and that Av, w = v, A∗w for all v and w . h i h i ∈V ∈W Consider now linear maps A : from an Hermitian space to itself. To such a map, one canV →V associate a sesquilinear form: A(z, w) := z, Aw . Vice versa, every sesquilinear form corresponds this way toh a uniquei linear transformation. (It is especially obvious when the inner product is written in the standard form z, w = z∗w using an orthonormal basis.) In particular, one can defineh Heirmitian (A∗ = A) and anti-Hermitian (A∗ = A) linear maps which corre- spond to Hermitian and anti-Hermitian− forms respectively. A linear map A : on a Hermitian vector space is called normal3 if it commutesV →V with its Hermitian adjoint: A∗A = AA∗. Examples 1. Hermitian and anti-Hermitian transformations are normal. Example 2. An invertible linear transformation U : is called unitary if it preserves inner products: V →V

Uz, Uw = z, w for all z, w . h i h i ∈V 3The term normal operator is frequently in use. 120 Chapter 4. EIGENVALUES Equivalently, z, (U ∗U I)w = 0 for all z, w . Taking z = (U ∗U I)w,h we conclude− thati (U ∗U I)w = ∈0 Vfor all w , and hence− U ∗U = I. Thus, for a unitary− map U, U −1 = U ∗.∈ The V converse statement is also true (and easy to check by starting from U −1 = U ∗ and reversing our computation). Since every invertible transformation commutes with its own inverse, we conclude that uni- tary transformations are normal. Example 3. Every linear transformation A : can be uniquely written as the sum A = B + C of HermitianV →V (B = (A + A∗)/2) and and ant-Hermitian (C = A A∗)/2) operators. We claim that an operator is normal if and only− if its Hermitian and anti-Hermitian parts commute. Indeed, A∗ = B C, AA∗ = B2 BC + CB C2, A∗A = B2 + BC CB C2, and− hence AA∗ = A∗−A if and only− if BC = CB. − −

The Spectral Theorem for Normal Operators Let A : be a linear transformation, v a vector, and λ C aV scalar. →V The vector v is called an eigenvector∈ V of A with the∈ eigenvalue λ, if v = 0, and Av = λv. In other words, A preserves the line spanned6 by the vector and acts on this line as the multiplication by λ. V Theorem. A linear transformation A : on a finite dimensional Hermitian vector space is normalV →V if and only if has an orthonormal basis of eigenvectors of A. V Proof. In one direction, the statement is almost obvious: If a basis consists of eigenvectors of A, then the matrix of A in this basis is diagonal. When the basis is orthonormal, the matrix of the adjoint operator A∗ in this basis is adjoint to the matrix of A and is also diagonal. Since all diagonal matrices commute, we conclude that A is normal. Thus, it remains to prove that, conversely, every normal operator has an orthonormal basis of eigenvectors. We will prove this in four steps. Step 1. Existence of eigenvalues. We need to show that there exists a scalar λ C such that the system of linear equations Ax = λx has a non-trivial∈ solution. Equivalently, this means that the linear transformation λI A has a non-trivial kernel. Since is finite dimensional, this can− be re-stated in terms of the determinaV nt of the matrix of A (in any basis) as

det(λI A) = 0. − 1. The Spectral Theorem 121 This relation, understood as an equation for λ, is called the char- acteristic equation of the operator A. When A = 0, it becomes λn = 0, where n = dim . In general, it is a degree-n polynomial equation V λn + p λn−1 + + p λ + p = 0, 1 · · · n−1 n where the coefficients p1,...,pn are certain algebraic expressions of matrix entries of A (and hence are complex numbers). According to the Fundamental Theorem of Algebra, this equation has a complex solution, say λ0. Then det(λ0I A) = 0, and hence the system (λ A)x = 0 has a non-trivial solution,− v = 0, which is therefore 0 − 6 an eigenvector of A with the eigenvalue λ0. Remark. Solutions to the system Ax = λ0x form a linear sub- space in , namely the kernel of λ I A, and eigenvectors of A W V 0 − with the eigenvalue λ0 are exactly all non-zero vectors in . Slightly abusing terminology, is called the eigenspace of A Wcorrespond- W ing to the eigenvalue λ0. Obviously, A( ) . Subspaces with such property are called A-invariant. ThusW ⊂ eigenspaces W of a linear transformation A are A-invariant. Step 2. A∗-invariance of eigenspaces of A. Let = 0 be the eigenspace of a normal operator A corresponding toW the 6 { eigenvalue} λ. Then for every w , ∈W A(A∗w)= A∗(Aw)= A∗(λw)= λ(A∗w).

Therefore A∗w , i.e. the eigenspace is A∗-invariant. ∈W W Step 3. Invariance of orthogonal complements. Let be a linear subspace. Denote by ⊥ the orthogonal complementW⊂V of the subspace with respectW to the Hermitian inner product: W ⊥ := v w, v = 0 for all w . W { ∈V|h i ∈W } Note that if e ,..., e is a basis in , then ⊥ is given by k linear 1 k W W equations ei, v = 0, i = 1,...,k, and thus has dimension n k. On the otherh hand,i ⊥ = 0 , because no vector≥w =− 0 can be orthogonal toW itself: ∩Ww, w {>}0. It follows from dimension6 counting formulas that dim h ⊥ = ni k. Moreover, this implies that = ⊥, i.e. the wholeW space is− represented as the direct sum ofV twoW⊕W orthogonal subspaces. We claim that if a subspace is both A-and A∗-invariant, then its orthogonal complement is also A- and A∗-invariant. Indeed, suppose that A∗( ) , and v ⊥. Then for any w , we have: W ⊂ W ∈ W ∈ W w, Av = A∗w, v = 0, since A∗w . Therefore Av ⊥, i.e. h i h i ∈ W ∈ W 122 Chapter 4. EIGENVALUES ⊥ is A-invariant. By the same token, if is A-invariant, then ⊥ isWA∗-invariant. W W Step 4. Induction on dim . When dim = 1, the theorem is obvious. Assume that the theoremV is proved forV normal operators in spaces of dimension

Unitary Transformations

Note that if λ is an eigenvalue of a unitary operator U then λ = 1. Indeed, if x = 0 is a corresponding eigenvector, then x,|x | = Ux, Ux = λλ¯ x6 , x , and since x, x = 0, it implies λλ¯ = 1.h i h i h i h i 6 Corollary 5. A transformation is unitary if and only if in a suitable orthonormal basis its matrix is diagonal, and the diagonal entries are complex numbers of the absolute value 1. On the complex line C, multiplication by λ with λ = 1 and arg λ = θ defines the rotation through the angle θ.| We| will call this transformation on the complex line a unitary rotation. We arrive therefore to the following geometric characterization of unitary transformations. Corollary 6. Unitary transformations in an Hermitian space of dimension n are exactly unitary rotations (through possibly different angles) in n mutually perpendicular com- plex directions.

Orthogonal Diagonalization

Corollary 7. A linear operator is Hermitian (respectively anti-Hermitian) if and only if in a suitable orthonormal basis its matrix is diagonal with all real (respectively imag- inary) diagonal entries. Indeed, if Ax = λx and A∗ = A, we have: ± λ x, x = x, Ax = A∗x, x = λ¯ x, x . h i h i h i ± h i Therefore λ = λ¯ provided that x = 0, i.e. eigenvalues of a Hermi- tian operator are± real and of anti-Hermitian6 imaginary. Vice versa, a real diagonal matrix is obviously Hermitian, and imaginary anti- Hermitian. 124 Chapter 4. EIGENVALUES Recall that (anti-)Hermitian operators correspond to (anti-)Her- mitian forms A(x, y) := x, Ay . Applying the Spectral Theorem and reordering the basis eigenvectorsh i in the monotonic order of the corresponding eigenvalues, we obtain the following classification re- sults for forms. Corollary 8. In a Hermitian space of dimension n, an Hermitian form can be transformed by unitary changes of coordinates to exactly one of the normal forms λ z 2 + + λ z 2, λ λ . 1| 1| · · · n| n| 1 ≥···≥ n

Corollary 9. In a Hermitian space of dimension n, an anti-Hermitian form can be transformed by unitary changes of coordinates to exactly one of the normal forms iω z 2 + + iω z 2, ω ω . 1| 1| · · · n| n| 1 ≥···≥ n

Uniqueness follows from the fact that eigenvalues and dimensions of eigenspaces are determined by the operators in a coordinate-less fashion. Corollary 10. In a complex vector space of dimension n, a pair of Hermitian forms, of which the first one is positive definite, can be transformed by a choice of a coordinate system to exactly one of the normal forms: z 2 + + z 2, λ z 2 + + λ z 2, λ λ . | 1| · · · | n| 1| 1| · · · n| n| 1 ≥···≥ n

This is the Orthogonal Diagonalization Theorem for Her- mitian forms. It is proved in two stages. First, applying the Inertia Theorem to the positive definite form one transforms it to the stan- dard form; the 2nd Hermitian form changes accordingly but remains arbitrary at this stage. Then, applying Corollary 8 of the Spectral Theorem, one transforms the 2nd Hermitian form to its normal form by transformations preserving the 1st one. Note that one can take the positive definite sesquilinear form corresponding to the 1st Hermitian form for the Hermitian inner product, and describe the 2nd form as z, Az , where A is an opera- tor Hermitian with respect to this innerh product.i The operator, its eigenvalues, and their multiplicities are thus defined by the given pair of forms in a coordinate-less fashion. This guarantees that pairs with different collections λ1 ...λn of eigenvalues are non-equivalent to each other. ≥ 1. The Spectral Theorem 125 Singular Value Decomposition Theorem. Let A : be a linear map or rank r between Hermitian spacesV of →W dimensions n and m respectively. Then there exist orthonormal bases: v1,..., vn of and w1,..., wm of , and positive reals µ µ , suchV that W 1 ≥···≥ r Av = µ w ,...,Av = µ w , Av = = Av = 0. 1 1 1 r r r r+1 · · · n Proof. For x, y , put4 ∈V H(x, y) := Ax, Ay = x, A∗Ay . h i h i This is an Hermitian sesquilinear form on (thus obtained by pulling the Hermitian inner product on backV to by means of A). It is non-negative (i.e. H(x, x) W0 for all x V ), and corresponds to the Hermitian operator H ≥:= A∗A. The∈ eigenspace V of H corre- sponding to the eigenvalue 0 coincides with the kernel of A. Indeed, Hx = 0 implies Ax = 0, i.e. Ax = 0. | | Applying the Spectral Theorem to H, we obtain an orthonormal basis of consisting of eigenvectors v1,..., vr of H with positive eigenvalues,V λ λ > 0, and eigenvectors v ,..., v lying 1 ≥···≥ r r+1 n in the kernel of A. For i r, put w = λ−1/2Av . We have: ≤ i i i

1 1 ∗ λj wi, wj = Avi, Avj = vi, A Avj = vi, vj , h i sλiλj h i sλiλj h i s λi h i which is equal to 1 when i = j and 0 when i = j. Thus w1,..., wr form an orthonormal basis in the range of A.6 Completing it to an orthonormal basis of , we obtain the required result with µ = √λ . W i i Remark. When Av = µw and A∗w = µv for unit vectors v and w, they are called right and left singular vectors of A correspond- ing to the singular value µ. The next reformulation provides the singular value decomposition of a matrix. Corollary. For every complex m n-matrix A of rank r there exist: unitary matrices U and× V of sizes m and n respectively, and a diagonal r r-matrix M with positive diagonal entries, such that × M 0 A = U ∗ V. 0 0  

4Note that the 1st inner product is in W while the 2nd one is in V. 126 Chapter 4. EIGENVALUES Complexification Since R C, every complex vector space can be considered as a real vector⊂ space simply by “forgetting” that one can multiply by non-real scalars. This operation is called realification; applied to a C-vector space , it produces an R-vector space, denoted R, of real dimension twiceV the complex dimension of . V V In the reverse direction, to a real vector space one can associate a complex vector space, C, called the complexificationV of . As a real vector space, it is theV direct sum of two copies of : V V C := (x, y) x, y . V { | ∈ V} Thus the addition is performed componentwise, while the multipli- cation by complex scalars α + iβ is introduced with the thought in mind that (x, y) stands for x + iy: (α + iβ)(x, y) := (αx βy, βx + αy). − This results in a C-vector space C whose complex dimension equals the real dimension of . V V Example. (Rn)C = Cn = x + iy x, y Rn . { | ∈ } A productive point of view on complexification is that it is a com- plex vector space with an additional structure that “remembers” that the space was constructed from a real one. This additional structure is the operation of complex conjugation (x, y) (x, y). The operation in itself is a map σ : C C, satisfying7→σ2 =− id, which is anti-linear over C. The latterV → means V that σ(λz) = λσ¯ (z) for all λ C and all z C. In other words, σ is R-linear, but anti- commutes∈ with multiplication∈ V by i: σ(iz)= iσ(z). − Conversely, let be a complex vector space equipped with an anti-linear operatorW whose square is the identity5: σ : , σ2 = id, σ(λz)= λσ¯ (z) for all λ C, z . W→W ∈ ∈W Let denote the real subspace in that consists of all σ-invariant vectors.V We claim that is canonicallyW identified with the complexification of :W = C. Indeed, every vector z is uniquely written asV theW sum ofV σ-invariant and σ-anti-invariant∈ W vectors: 1 1 z = (z + σz)+ (z σz). 2 2 − 5An transformation whose square is the identity is called an involution. 1. The Spectral Theorem 127 Since σi = iσ, multiplication by i transforms σ-invariant vectors to σ-anti-invariant− ones, and vice versa. Thus, as a real space is the direct sum (i ) = x + iy x, y , whereW multiplication by i acts in theV required⊕ V for{ the complexification| ∈ V} fashion: i(x + iy) = y + ix. − The construction of complexification and its abstract description in terms of the complex conjugation operator σ are the tools that allow one to carry over results about complex vector spaces to real vector spaces. The idea is to consider real objects as complex ones invariant under the complex conjugation σ, and apply (or improve) theorems of complex linear algebra in a way that would respect σ. Example. A real matrix can be considered as a complex one. This way an R-linear map defines a C-linear map (on the complexified space). More abstractly, given an R-linear map A : , one can associate to it a C-linear map AC : C C byVAC →V(x, y) := (Ax, Ay). This map is real in the sense thatV → it commutesV with the complex conjugation: ACσ = σAC. Vice versa, let B : C C be a C-linear map that commutes with σ: σ(Bz) = Bσ(zV) for→ all V z C. When σ(z) = z, we find σ(Bz) = Bz, i.e. the subspaces∈ Vand i of real and± imaginary vectors are±B-invariant. Moreover,V since B Vis C-linear, we find that for x, y , B(x + iy)= Bx + iBy. Thus B = AC where the linear operator∈VA : is obtained by restricting B to . V→V V Euclidean Spaces Let be a real vector space. A Euclidean inner product (or Eu- clideanV structure) on is defined as a positive definite symmetric bilinear form , . A realV vector space equipped with a Euclidean in- ner product ish· called·i a Euclidean space. A Euclidean inner product allows one to talk about distances between points and angles between directions: x, y x y = x y, x y , cos θ(x, y) := h i . | − | h − − i x y p | | | | It follows from the Inertia Theorem that every finite dimen- sional space has an orthonormal basis. In coordinates corresponding to an orthonormal basis e1,..., en the in- ner product is given by the standard formula: n x, y = x y e , e = x y + + x y . h i i jh i ji 1 1 · · · n n i,jX=1 128 Chapter 4. EIGENVALUES Thus, every Euclidean space of dimension n can be identified with the coordinate EuclideanV space Rn by an isomorphism Rn respecting inner products. Such an isomorphism is not unique,→ but V can be composed with any invertible linear transformation U : preserving the Euclidean structure: V→V

Ux, Uy = x, y for all x, y . h i h i ∈V Such transformations are called orthogonal. A Euclidean structure on a vector space allows one to identify the space with its dual ∗ by the rule that toV a vector v assigns the linear function on Vwhose value at a point x is equal∈V to the inner product v, x .V Respectively, given a linear∈V map A : between Euclideanh i spaces, the adjoint map At : ∗ ∗Vcan →W be considered as a map between the spaces themselves:W A→t V: . The defining property of the adjoint map reads: W →V

Atw, v = w, Av for all v and w . h i h i ∈V ∈W Consequently matrices of adjoint maps A and At with respect to orthonormal bases of the Euclidean spaces and are transposed to each other. V W As in the case of Hermitian spaces, one easily derives that a linear transformation U : is orthogonal if and only if U −1 = U t. In the matrix form,V the →V relation U tU = I means that columns of U form an orthonormal set in the coordinate Euclidean space. Our nearest goal is obtain real analogues of the Spectral Theorem and its corollaries. One way to do it is to combine corresponding complex results with complexification. Let be a Euclidean space. We extend the inner product to the complexificationV C in such a way that it becomes an Hermitian inner product. Namely,V for all x, y, x′, y′ , put ∈V x + iy, x′ + iy′ = x, x′ + y, y′ + i x, y′ i y, x′ . h i h i h i h i− h i It is straightforward to check that this form on C is sesquilinear and Hermitian symmetric. It is also positive definiteV since x + iy, x + iy = x 2 + y 2. Note that changing the signs of y andhy′ preserves thei real| | part| and| reverses the imaginary part of the form. In other words, for all z, w C, we have: ∈V σ(z),σ(w) = z, w (= w, z ). h i h i h i 1. The Spectral Theorem 129 This identity expresses the fact that the Hermitian structure of C came from a Euclidean structure on . When A : C C is a realV operator, i.e. σAσ = A, the HermitianV adjoint operatorV →VA∗ is also real.6 Indeed, since σ2 = id, we find that for all z, w C ∈V σA∗σz, w = σw, A∗σz = Aσw,σz = σAw,σz = z, Aw , h i h i h i h i h i i.e. σA∗σ = A∗. In particular, complexifications of orthogonal (U −1 = U t), symmetric (At = A), anti-symmetric (At = A), normal (AtA = AAt) operators in a Euclidean space are respec-− tively unitary, Hermitian, anti-Hermitian, normal operators on the complexified space, commuting with the complex conjugation.

The Real Spectral Theorem Theorem. Let be a Euclidean space, and A : a normal operator. ThenV in the complexification C,V→V there exists an orthonormal basis of eigenvectors of ACVwhich is invariant under complex conjugation and such that the eigenvalues corresponding to conjugated eigenvectors are conjugated. Proof. Applying the complex Spectral Theorem to the normal operator B = AC, we obtain a decomposition of the complexified space C into a direct orthogonal sum of eigenspaces ,..., V W1 Wr of B corresponding to distinct complex eigenvalues λ1,...,λr. Note that if v is an eigenvector of B with an eigenvalue µ, then Bσv = σBv = σ(µv) =µσ ¯ v, i.e. σv is an eigenvector of B with the con- jugate eigenvalueµ ¯. This shows that if λi is a non-real eigenvalue, then its conjugate λ¯i is also one of the eigenvalues of B (say, λj), and the corresponding eigenspaces are conjugated: σ( i)= j. By the same token, if λ is real, then σ( )= . This lastW equalityW means k Wk Wk that k itself is the complexification of a real space, namely of the σ-invariantW part of . It coincides with the space Ker(λ I A) Wk k − ⊂V of real eigenvectors of A with the eigenvalue λk. Thus, to construct a required orthonormal basis, we take: for each real eigenspace k, a Euclidean orthonormal basis in the corresponding real eigenspace,W and for each pair i, j of complex conjugate eigenspaces, an Her- mitian orthonormalW basisW f in and the conjugate basis σ(f ) { α} Wi { α } in j = σ( i). The vectors of all these bases altogether form an W W C orthonormal basis of satisfying our requirements.  V 6This is obvious in the matrix form: In a real orthonormal basis of V (which is a complex orthonormal basis of VC) A has a real matrix, so that A∗ = At. We use here a “hard way” to illustrate how various aspects of σ-invariance fit together. 130 Chapter 4. EIGENVALUES Example 1. Identify C with the Euclidean plane R2 in the usual way, and consider the operator (x + iy) (α + iβ)(x + iy) of mul- tiplication by given complex number α +7→iβ. In the basis 1, i, it has the matrix α β A = . β− α   Since At represents multiplication by α iβ, it commutes with A. Therefore A is normal. It is straightforward− to check that 1 1 1 1 z = and z¯ = √ i √ i 2  −  2   are complex eigenvectors of A with the eigenvalues α + iβ and α iβ respectively, and form an Hermitian orthonormal basis in (R2)C−. n Example 2. If A is a linear transformation in R , and λ0 is a non-real root of its characteristic polynomial det(λI A), then the − system of linear equations Az = λ0z has non-trivial solutions, which cannot be real though. Let z = u + iv be a complex eigenvector of A with the eigenvalue λ = α + iβ. Then σz = u iv is an eigenvector 0 − of A with the eigenvalue λ¯0 = α iβ. Since λ0 = λ¯0, the vectors z and σz are linearly independent− over C, and hence6 the real vectors u and v must be linearly independent over R. Consider the plane Span(u, v) Rn. Since ⊂ A(u iv) = (α iβ)(u iv) = (αu βv) i(βu + αv), − − − − − we conclude that A preserves this plane and in the basis u, v in it α β − (note the sign change!) acts by the matrix . If we assume β− α in addition that A is normal (with respect to the standard Euclidean structure in Rn), then the eigenvectors z and σz must be Hermitian orthogonal, i.e. u iv, u + iv = u, u v, v + 2i u, v = 0. h − i h i − h i h i We conclude that u, v = 0 and u 2 v 2 = 0, i.e. u and v are orthogonal and haveh thei same length.| | − Normalizing | | the length to 1, we obtain an orthonormal basis of the A-invariant plane, in which the transformation A acts as in Example 1. The geometry of this transformation is known to us from studying geometry of complex numbers: It is the composition of the rotation through the angle arg(λ0) with the expansion by the factor λ0 . We will cal such a transformation of the Euclidean plane a complex| | multiplication or multiplication by a complex scalar, λ0. 1. The Spectral Theorem 131 Corollary 1. Given a normal operator on a Euclidean space, the space can be represented as a direct orthogonal sum of invariant lines and planes, on each of which the transformation acts as multiplication by a real or complex scalar respectively. Corollary 2. A transformation in a Euclidean space is orthogonal if and only it the space can be represented as the direct orthogonal sum of invariant lines and planes on each of which the transformation acts as multiplication by 1 and rotation respectively. ± Corollary 3. In a Euclidean space, every symmetric op- erator has an orthonormal basis of eigenvectors. Corollary 4. Every quadratic form in a Euclidean space of dimension n can be transformed by an orthogonal change of coordinates to exactly one of the normal forms:

λ x2 + + λ x2 , λ λ . 1 1 · · · n n 1 ≥···≥ n Corollary 5. In a Euclidean space of dimension n, every anti-symmetric bilinear form can be transformed by an or- thogonal change of coordinates to exactly one of the normal forms r x, y = ω (x y x y ), ω ω > 0, 2r n. h i i 2i−1 2i − 2i 2i−1 1 ≥···≥ r ≤ Xi=1 Corollary 6. Every real normal matrix A can be written in the form A = U ∗MU where U is an , and M is block-diagonal matrix with each block of size 1, or α β of size 2 of the form − , where α2 + β2 = 0. β α 6   If A is symmetric, then only blocks of size 1 are present (i.e. M is diagonal). If A is anti-symmetric, then blocks of size 1 are zero, 0 ω and of size 2 are of the form , where ω > 0. ω −0   If A is orthogonal, then all blocks of size 1 are equal cos θ sin θ to 1, and blocks of size 2 have the form − , ± sin θ cos θ where 0 <θ<π.   132 Chapter 4. EIGENVALUES Courant–Fisher’s Minimax Principle One of the consequences (equivalent to Corollary 4) of the Spectral Theorem is that a pair (Q,S) of quadratic forms in Rn of which the first one is positive definite can be transformed by a linear change of coordinates to the normal form: Q = x2 + + x2 , S = λ x2 + + λ x2 , λ λ . 1 · · · n 1 1 · · · n n 1 ≥···≥ n The eigenvalues λ1 ...λn form the spectrum of the pair (Q,S). The following result≥ gives a coordinate-less, geometric description of the spectrum. Theorem. The k-th greatest spectral number is given by S(x) λk = max min , W: dim W=k x∈W−0 Q(x) where the maximum is taken over all k-dimensional sub- spaces Rn, and minimum over all non-zero vectors in the subspace.W ⊂ Proof. When is given by the equations xk+1 = = xn = 0, the minimal ratio WS(x)/Q(x), achieved on vectors proportional· · · to ek, is equal to λk because λ x2 + + λ x2 λ (x2 + + x2) when λ λ . 1 1 · · · k k ≥ k 1 · · · k 1 ≥···≥ k Therefore it suffices to prove for every other k-dimensional subspace the minimal ratio cannot be greater than λk. For this, denote Wby the subspace of dimension n k + 1 given by the equations x =V = x = 0. Since λ − λ , we have: 1 · · · k−1 k ≥···≥ n λ x2 + + λ x2 λ (x2 + + x2 ), k k · · · n n ≤ k k · · · n i.e. for all non-zero vectors x in the ratio S(x)/Q(x) λk. Now we invoke the dimension countingV argument: dim + dim≤ = k + (n k +1) = n + 1 > dim Rn, and conclude that Whas a non-trivialV intersection− with . Let x be a non-zero vectorW in . Then S(x)/Q(x) λ , andV hence the minimum of the ratio W∩VS/Q on 0 ≤ k W− cannot exceed λk.  Applying Theorem to the pair (Q, S) we obtain yet another characterization of the spectrum: − S(x) λk = min max . W: dim W=n−k x∈W−0 Q(x) 1. The Spectral Theorem 133 Formulating some applications, we assume that the space Rn is Euclidean, and refer to the spectrum of the pair (Q,S) where Q = x 2, simply as the spectrum of S. | | Corollary 1. When a quadratic form increases, its spec- ′ ′ tral numbers do not decrease: If S S then λk λk for all k = 1,...,n. ≤ ≤ Proof. Indeed, since S/Q S′/Q, the minimum of the ratio S/Q on every k-dimensional subspace≤ cannot exceed that of S′/Q, which in particular remains true for thatW on which the maximum W of S/Q equal to λk is achieved. The following result is called Cauchy’s interlacing theorem.

Corollary 2. Let λ1 λn be the spectrum of a ≥′ ··· ≥ ′ quadratic form S, and λ1 λn−1 be the spectrum of the quadratic form S′ obtained≥ ··· by ≥ restricting S to a given hyperplane Rn−1 Rn. Then: ⊂ λ λ′ λ λ′ λ λ′ λ . 1 ≥ 1 ≥ 2 ≥ 2 ≥···≥ n−1 ≥ n−1 ≥ n Proof. Maximum over all k-dimensional subspaces cannot be smaller than maximum (of the same quantities) over subspaceW s lying ′ inside the hyperplane. This proves that λk λk. Applying the same argument to S and subspaces of dimension≥ n k 1, we conclude that λ − λ′ .  − − − k+1 ≥− k An ellipsoid in a Euclidean space is defined as the level-1 set E = x S(x) = 1 of a positive definite quadratic form, S. It follows from{ the| Spectral} Theorem that every ellipsoid can be transformed by an orthogonal transformation to principal axes: a normal form

2 2 x1 xn 2 + + 2 = 1, 0 < α1 αn. α1 · · · αn ≤···≤

The vectors x = αkek lie on the ellipsoid, and their lengths αk are called semiaxes± of E. They are related to the spectral numbers −1 λ1 λk > 0 of the quadratic form by αk = √λk. From Corollaries≥ ··· ≥ 1 and 2 respectively, we obtain: If one ellipsoid is enclosed by another, the semiaxes of the inner ellipsoid do not exceed corresponding semiaxes of the outer: If E′ E, then α′ α for all k = 1,...,n. ⊂ k ≤ k Semiaxes of a given ellipsoid are interlaced by semiaxes of any section of it by a hyperplane passing through the center: If E′ = E Rn−1, then α α′ α for k = 1,...,n 1. ∩ k ≤ k ≤ k+1 − 134 Chapter 4. EIGENVALUES

EXERCISES 253. Prove that if two vectors u and v in an Hermitian space are orthog- onal, then u 2 + v 2 = u v 2. Is the converse true?  | | | | | − | 254. Prove that for any vectors u, v in an Hermitian space, u + v 2 + u v 2 =2 u 2 +2 v 2. | | | − | | | | | Find a geometric interpretation of this fact. 

255. Apply Gram–Schmidt orthogonalization to the basis f1 = e1 +2ie2 + 3 2ie3, f2 = e1 +2ie2, f3 = e1 in the coordinate Hermitian space C .

256. Apply Gram–Schmidt orthogonalization to the standard basis e1, e2 of C2 to construct an orthonormal basis of the Hermitian inner product z, w =z ¯ w +2¯z w +2¯z w +5¯z w . h i 1 1 1 2 2 1 2 2 257. Let f be a vector in an Hermitian space, e ,..., e an orthonor- ∈V 1 k mal basis in a subspace . Prove that u = e , v e is the point of W h i i i W closest to v, and that v u is orthogonal to . (The point u is called − W ∈ W the orthogonal projection of v to .) P ⋆ W 258. Let f1,..., fN be a finite sequence of vectors in an Hermitian space. The Hermitian N N-matrix f , f is called the Gram matrix of the × h i j i sequence. Show that two finite sequences of vectors are isometric, i.e. ob- tained from each other by a unitary transformation, if and only if their Gram matrices are the same.

259. Prove that A, B := tr(A∗B) defines an Hermitian inner product on h i the space Hom(Cn, Cm) of m n-matrices. × 260. Let A ,...,A : be linear maps between Hermitian spaces. 1 k V → W Prove that if A∗A = 0, then A = = A = 0. i i 1 · · · k 261. Let A : be a linear map between Hermitian spaces. Show PV → W that B := A∗A and C = AA∗ are Hermitian, and that the corresponding Hermitian forms B(x, x) := x,Bx in and C(y, y) := y, Cy in are h i V h i W non-negative. Under what hypothesis about A is the 1st of them positive? the 2nd one? are they both positive? 262. Let be a subspace in an Hermitian space, and let P : W⊂V V→V be the map that to each vector v assigns its orthogonal projection to ∈ V . Prove that P is an Hermitian operator, and that P 2 = P , and that W Ker P = ⊥.(It is called the orthogonal projector to .) W W 263. Prove that an n n-matrix is unitary if and only if its rows (or × columns) form an orthonormal basis in the coordinate Hermitian space Cn. 264. Prove that the determinant of a unitary matrix is a complex number of absolute value 1. 265. Prove that the Cayley transform: C (I C)/(I+C), well-defined 7→ − for linear transformations C such that I+C is invertible, transforms unitary 1. The Spectral Theorem 135 operators into anti-Hermitian and vice versa. Compute the square of the Cayley transform.  266. Prove that the commutator AB BA of anti-Hermitian operators A − and B is anti-Hermitian. 267. Give an example of a normal 2 2-matrix which is not Hermitian, × anti-Hermitian, or unitary. 268. Prove that for any n n-matrix A and any complex numbers α, β of × absolute value 1, the matrix αA + βA∗ is normal.

269. Prove that A : is normal if and only if Ax = A∗x for all V →V | | | | x . ∈V 270. Prove that the characteristic polynomial det(λI A) of a square ma- − 1 trix A does not change under similarity transformations A C− AC and 7→ thus depends only on the linear operator defined by the matrix. n n 1 271. Show that if λ + p λ − + + p is the characteristic polynomial 1 · · · n of a matrix A, then p = ( 1)n det A, and p = tr A. n − 1 − 272. Prove that all roots of characteristic polynomials of Hermitian matri- ces are real. 273. Find eigenspaces and eigenvalues of an orthogonal projector to a sub- space in an Hermitian space. W⊂V 274. Prove that every Hermitian operator P satisfying P 2 = P is an or- thogonal projector. Does this remain true if P is not Hermitian? 275. Prove directly, i.e. not referring to the Spectral Theorem, that every Hermitian operator has an orthonormal basis of eigenvectors.  276. Prove that if A and B are normal and AB = 0, then BA = 0. 277.⋆ Let A be a normal operator. Prove that the set of complex numbers x, Ax x = 1 is a convex polygon whose vertices are eigenvalues of {h i | | | } A.  278. Prove that two (or several) commuting normal operators have a com- mon orthonormal basis of eigenvectors. 

279. Prove that if A is normal and AB = BA, then AB∗ = B∗A, A∗B = BA∗, and A∗B∗ = B∗A∗.

280. Prove that if (λ λ1) (λ λn) is the characteristic polynomial of − · · · 2− a normal operator A, then λ = tr(A∗A). | i| 281. Classify up to linearP changes of coordinates pairs (Q, A) of forms, where S is positive definite Hermitian, and A anti-Hermitian. 282. An Hermitian operator S is called positive (written: S 0) if ≥ x,Sx) 0 for all x. Prove that for every positive operator S there is h ≥ a unique positive square root (denoted by √S), i.e. a positive operator whose square is S. 136 Chapter 4. EIGENVALUES 283. Using Singular Value Decomposition theorem with m = n, prove that every linear transformation A of an Hermitian space has a polar decom- position A = SU, where S is positive, and U is unitary. 284. Prove that the polar decomposition A = SU is unique when A is 1 invertible; namely S = √AA∗, and U = S− A. What are polar decompo- sitions of non-zero 1 1-matrices? × 285. Describe the complexification of Cn considered as a real vector space. 286. Let σ be the complex conjugation operator on Cn. Consider Cn as a real vector space. Show that σ is symmetric and orthogonal. 287. Prove that every orthogonal transformation in R3 is either a rotation through an angle θ, 0 θ π, about some axis, or the composition of such ≤ ≤ a rotation with the reflection about the plane perpendicular to the axis. 288. Find an orthonormal basis in Cn in which the transformation defined by the cyclic permutation of coordinates: (z ,z ,...,z ) (z ,...,z ,z ) 1 2 n 7→ 2 n 1 is diagonal. 289. In the coordinate Euclidean space Rn with n 4, find real and com- ≤ plex normal forms of orthogonal transformations defined by various permu- tations of coordinates. 290. Transform to normal forms by orthogonal transformations: (a) x x + x x , (b) 2x2 4x x + x2 4x x , 1 2 3 4 1 − 1 2 2 − 2 3 (c) 5x2 +6x2 +4x2 4x x 8x x 4x x . 1 2 3 − 1 2 − 1 3 − 2 3 291. In Euclidean spaces, classify all operators which are both orthogonal and anti-symmetric. 292. Let and be two subspaces of dimension 2 in the Euclidean 4- U V space. Consider the map T : defined as the composition: R4 V→V V ⊂ → R4 , where the arrows are the orthogonal projections to and U ⊂ → V U respectively. Prove that T is positive, and that its eigenvalues have the V form cos φ, cos ψ where φ, ψ are certain angles, 0 φ, ψ π/2. ≤ ≤ 293. Solve Gelfand’s problem: In the Euclidean 4-space, classify pairs of planes passing through the origin up to orthogonal transformations of the space.  294. Prove that every ellipsoid in Rn has pairwise perpendicular hyper- planes of bilateral symmetry. 295. Prove that every ellipsoid in R4 has a plane section which is a circle. Does this remain true for ellipsoids in R3? 296. Formulate and prove counterparts of Courant–Fisher’s minimax prin- ciple and Cauchy’s interlacing theorem for Hermitian forms. 297. Prove that semiaxes α α . . . of an ellipsoid in Rn and semiaxes 1 ≤ 2 ≤ α′ α′ . . . of its section by a linear subspaces of codimension k are k ≤ 2 ≤ related by the inequalities: α α′ α , i =1,...,n k. i ≤ i ≤ i+k − 2. Jordan Canonical Forms 137 2 Jordan Canonical Forms

Characteristic Polynomials and Root Spaces

Let be a finite dimensional K-vector space. We do not assume that isV equipped with any structure in addition to the structure of a K- vectorV space. In this section, we study geometry of linear operators on . In other words, we study the problem of classification of linear operatorsV A : up to similarity transformations A C−1AC, where C standsV→V for arbitrary invertible linear transformations7→ of . V Let n = dim , and let A be the matrix of a linear operator with respect to someV basis of . Recall that V det(λI A)= λn + p λn−1 + + p λ + p − 1 · · · n−1 n is called the characteristic polynomial of A. In fact it does not depend on the choice of a basis. Indeed, under a change x = Cx′ of coordinates, the matrix of a linear operator x Ax is transformed into the matrix C−1AC similar to A. We have:7→

det(λI C−1AC) = det[C−1(λI A)C]= − − (det C−1) det(λI A)(det C) = det(λI A). − − Therefore, the characteristic polynomial of a linear operator is well- defined (by the geometry of A). In particular, coefficients of the characteristic polynomial do not change under similarity transformations.

Let λ K be a root of the characteristic polynomial. Then 0 ∈ det(λ0I A) = 0, and hence the system of homogeneous linear equa- tions Ax− = λ x has a non-trivial solution, x = 0. As before,we 0 6 call any such solution an eigenvector of A, and call λ0 the corre- sponding to the eigenvalue. All solutions to Ax = λ0x (including x = 0) form a linear subspace in , called the eigenspace of A V corresponding to the eigenvalue λ0. Let us change slightly our point of view on the eigenspace. It is the null space of the operator A λ0I. Consider powers of this − k operator and their null spaces. If (A λ0I) x = 0 for some k > 0, l − then then (A λ0I) x = 0 for all l k. Thus the null spaces are nested: − ≥

Ker(A λ I) Ker(A λ I)2 Ker(A λ I)k . . . − 0 ⊂ − 0 ⊂···⊂ − 0 ⊂ 138 Chapter 4. EIGENVALUES On the other hand, since dim < , nested subspaces must stabi- lize, i.e. starting from some m>V 0,∞ we have:

:= Ker(A λ I)m = Ker(A λ I)m+1 = Wλ0 − 0 − 0 · · · We call the subspace a root space of the operator A, corre- Wλ0 sponding to the root λ0 of the characteristic polynomial. m Note that if x λ0 , then Ax λ0 , because (A λ0I) Ax = m ∈W ∈W − A(A λ0I) x = A0 = 0. Thus a root space is A-invariant. Denote − m by λ0 the range of (A λ0I) . It is also A-invariant, since if x = (A U λ I)my, then Ax =−A(A λ I)my = (A λ I)m(Ay). − 0 − 0 − 0 Lemma. = . V Wλ0 ⊕Uλ0 m Proof. Put B := (A λ0I) , so that λ0 = Ker B, λ0 = B( ). Let x = By Ker B−. Then Bx = 0,W i.e. y Ker B2U. But V 2 ∈ ∈ Ker B = Ker B by the assumption that Ker B = λ0 is the root space. Thus y Ker B, and hence x = By = 0. ThisW proves that Ker B B( )=∈ 0 . Therefore the subspace in spanned by Ker B and (∩ ) isV their{ direct} sum. On the other hand,V for any operator, dim KerB VB + dim B( ) = dim . Thus, the subspace spanned by Ker B and B( ) is theV whole spaceV . V V Corollary 1. Root spaces of the restriction of A to Uλ0 are exactly those root spaces λ of A in the whole space , which correspond to roots λ =Wλ . V 6 0 Proof. It suffices to prove that if λ = λ is another root of the 6 0 characteristic polynomial, then λ λ0 . Indeed, λ is invariant with respect to A λ I, but containsW ⊂ U no eigenvectorsW of A with − 0 eigenvalue λ0. Therefore A λ0I and all powers of it are invertible on . Thus lies in the− range of B = (A λ I)m. Wλ Wλ Uλ0 − 0 Corollary 2. Suppose that (λ λ )m1 . . . (λ λ )mr is the − 1 − r characteristic polynomial of A : , where λ1,...,λr K are pairwise distinct roots. ThenV→Vis the direct sum of root∈ spaces: V = . V Wλ1 ⊕···⊕Wλr Proof. By induction on r, it follows from Corollary 1 that the subspace spanned by root spaces is their direct sum W ⊂V Wλi and moreover, = , where is the intersection of all λi . Furthermore, theV characteristicW⊕U polynomialU of A restricted to U Uhas none of λi as a root. From our assumption that the characteristic polynomial of A factors into λ λ , it follows however that dim = 0. − i U 2. Jordan Canonical Forms 139 Indeed, the characteristic polynomial of A is equal to the product of the characteristic polynomials of its restrictions to and (because the matrix of A in a suitable basis of a direct sumW is blockU diagonal). Hence the factor corresponding to must be of degree 0. U Remarks. (1) We will see later that dimensions of the root spaces coincide with multiplicities of the roots: dim = m . Wλi i (2) The restriction of A to has the property that some power Wλi of A λiI vanishes. A linear operator some power of which vanishes is called− nilpotent. Our next task will be to study the geometry of nilpotent operators. (3) Our assumption that the characteristic polynomial factors completely over K is automatically satisfied in the case K = C due to the Fundamental Theorem of Algebra. Thus, we have proved for every linear operator on a finite dimensional complex vector space, that the space decomposes in a canonical fashion into the direct sum of invariant subspaces on each of which the operator differs from a nilpotent one by scalar summand.

Nilpotent Operators

Example. Introduce a nilpotent linear operator N : Kn Kn by describing its action on vectors of the standard basis: →

Nen = en−1, Nen−1 = en−2, ..., Ne2 = e1, Ne1 = 0.

Then N n = 0 but N n−1 = 0. We will call N, as well as any operator similar to it, a regular nilpotent6 operator. The matrix of N in the standard basis has the form

0 1 0 . . . 0 0 0 1 . . . 0  . . .  .  0 0 . . . 0 1   0 0 . . . 0 0      It has the range of dimension n 1 spanned by e ,..., e and the − 1 n−1 null space of dimension 1 spanned by e1. Proposition. Let N : be a nilpotent operator on a K-vector space of finiteV→V dimension. Then the space can be decomposed into the direct sum of N-invariant subspaces, on each of which N is regular. 140 Chapter 4. EIGENVALUES Proof. We use induction on dim . When dim = 0, there is nothing to prove. Now consider the caseV when dim V> 0. V The range N( ) is N-invariant, and dim N( ) < dim V (since otherwise N couldV not be nilpotent). By the inductionV hypothe- sis, there space N( ) can be decomposed into the direct sum of N- invariant subspaces,V on each of which N is regular. Let l be the num- (i) (i) ber of these subspaces, n1,...,nl their dimensions, and e1 ,..., eni a basis in the ith subspace such that N acts on the basis vectors as in Example: e(i) e(i) 0. ni 7→ · · · 7→ 1 7→ (i) (i) Since each eni lies in the range of N, we can pick a vector eni+1 (i) (i) (1) (l) ∈V such that Neni+1 = eni . Note that e1 ,..., e1 form a basis in (Ker N) N( ). We complete it to a basis ∩ V (1) (l) (l+1) (r) e1 ,..., e1 , e1 ,..., e1 (i) of the whole null space Ker N. We claim that all the vectors ej form a basis in , and therefore the l + r subspaces V (i) (i) (i) Span(e1 ,..., eni , eni+1), i = 1,...,l,l + 1,...,r, (of which the last r l are 1-dimensional) form a decomposition of into the direct sum− with required properties. V To justify the claim, notice that the subspace spanned U ⊂V by n + + n = dim N( ) vectors e(i) with j > 1 is mapped by 1 · · · l V j N onto the space N( ). Therefore: those vectors form a basis of , dim = dim N( ),V and Ker N = 0 . On the other hand, U U(i) V U ∩ { } vectors ej with j = 1 form a basis of Ker N, and since dim Ker N + dim N( ) = dim , together with the above basis of , they form a basis ofV .  V U V Corollary 1. The matrix of a nilpotent operator in a suitable basis is block diagonal with regular diagonal blocks (as in Example) of certain sizes n n > 0. 1 ≥···≥ r The basis in which the matrix has this form, as well as the de- composition into the direct sum of invariant subspaces as described in Proposition, are not canonical, since choices are involved on each step of induction. However, the dimensions n1 nr > 0 of the subspaces turn out to be uniquely determined≥ by··· the ≥ geometry of the operator. 2. Jordan Canonical Forms 141 To see why, introduce the following Young tableaux (Figure 41). It consist of r rows of identical square cells. The lengths of the rows represent dimensions n1 n2 nr > 0 of invariant subspaces, and in the cells of each≥ row≥···≥ we place the basis vectors of the corresponding subspace, so that the operator N sends each vector to its left neighbor (and vectors of the leftmost column to 0).

e(1) e (1) e (1)e(1) e (1) 1 2 3 n−11 n1 ee(2) e (2) e (2) e (2) 1 2 3 n2 e(3) e (3) e (3)e (3) 1 2 3 n3

(r−1) (r−1) e 1 e 2 (r) e 1 Figure 41

The format of the tableaux is determined by the partition of the total number n of cells (equal to dim ) into the sum n1 + + nr of positive integers. Reading the sameV format by columns, we· · obtain · another partition n = m + + m , called transposed to the first 1 · · · d one, where m1 md > 0 are the heights of the columns. Obviously, two transposed≥ ··· ≥ partitions determine each other. (i) It follows from the way how the cells are filled with vectors ej , that the vectors in the columns 1 through k form a basis of the space Ker N k. Therefore

m = dimKer N k dim Ker N k−1, k = 1,...,d. k − Corollary 2. Consider the flag of subspaces defined by a nilpotent operator N : : V→V Ker N Ker N 2 Ker N d = ⊂ ⊂···⊂ V and the partition of n = dim V into the summands mk = dim Ker N k dim Ker N k−1. Then summands of the transposed − partition n = n1 + + nr are the dimensions of the regular nilpotent blocks of· ·N · (described in Proposition and Corollary 1). Corollary 3. The number of equivalence classes of nilpo- tent operators on a vector space of dimension n is equal to the number of partitions of n. 142 Chapter 4. EIGENVALUES The Jordan Canonical Form Theorem We proceed to classification of linear operators A : Cn Cn up to similarity transformations. → Theorem. Every complex matrix is similar to a block di- agonal normal form with each diagonal block of the form:

λ0 1 0 . . . 0 0 λ0 1 . . . 0  . . .  , λ C, 0 ∈  0 0 ... λ0 1   0 0 . . . 0 λ   0    and such a normal form is unique up to permutations of the blocks. The block diagonal matrices described in the theorem are called Jordan canonical forms (or Jordan normal forms). Their di- agonal blocks are called Jordan cells. It is instructive to analyze a Jordan canonical form before going into the proof of the theorem. The characteristic polynomial of a m Jordan cell is (λ λ0) where m is the size of the cell. The charac- teristic polynomial− of a block diagonal matrix is equal to the product of characteristic polynomials of the diagonal blocks. Therefore the characteristic polynomial of the whole Jordan canonical form is the mi product of factors (λ λi) , one per Jordan cell. Thus the diagonal entries of Jordan cells− are roots of the characteristic polynomial. Af- ter subtracting the scalar matrix λ0I, Jordan cells with λi = λ0 (and only these cells) become nilpotent. Therefore the root space λ0 is exactly the direct sum of those subspaces on which the JordanWcells with λi = λ0 operate. Proof of Theorem. Everything we need has been already es- tablished in the previous two subsections. Thanks to the Fundamental Theorem of Algebra, the characteris- tic polynomial det(λI A) of a complex n n-matrix A factor into the − × n1 nr product of powers of distinct linear factors: (λ λ1) (λ λr) . According to Corollary 2 of Lemma, the space C−n is decomposed· · · − in a canonical fashion into the direct sum λ1 λr of A-invariant root subspaces. On each root subspaceW ⊕···⊕W, the operator A λ I Wλi − i is nilpotent. According to Proposition, the root space ni is rep- resented (in a non-canonical fashion) as the direct sum ofW invariant subspaces on each of which A λ I acts as a regular nilpotent. Since − i 2. Jordan Canonical Forms 143 the scalar operator λiI leaves every subspace invariant, this means that is decomposed into the direct sum of A-invariant subspaces, Wλi on each of which A acts as a Jordan cell with the eigenvalue λ0 = λi. Thus, existence of a basis in which A is described by a Jordan normal form is established.

To prove uniqueness, note that the root spaces λi are intrin- sically determined by the operator A, and the partitionW of dim Wλi into the sizes of Jordan cells with the eigenvalue λi is uniquely deter- mined, according to Corollary 2 of Proposition, by the geometry of the operator A λiI nilpotent on λi . Therefore the exact structure of the Jordan normal− form of A (i.e.W the numbers and sizes of Jordan cells for each of the eigenvalues λi) is uniquely determined by A, and only the ordering of the diagonal blocks remains ambiguous.  Corollary 1. Dimensions of root spaces coincide Wλi with multiplicities of λi as roots of the characteristic poly- nomial. Corollary 2. If the characteristic polynomial of a com- plex matrix has only simple roots, then the matrix is diag- onalizable, i.e. is similar to a diagonal matrix. Corollary 3. Every operator A : Cn Cn in a suitable basis is described by the sum D + N of two→ commuting ma- trices, of which D is diagonal, and N strictly upper trian- gular. Corollary 4. Every operator on a complex vector space of finite dimension can be represented as the sum D + N of two commuting operators, of which D is diagonalizable and N nilpotent. Remark. We used that K = C only to factor the characteris- tic polynomial of the matrix A into linear factors. Therefore the same results hold true over any field K such that all non-constant polynomials from K[λ] factor into linear factors. Such fields are called algebraically closed. In fact (see [8]) every field K is con- tained in an algebraically closed field. Thus every linear operator A : Kn Kn can be brought to a Jordan normal form by transfor- → −1 mations A C AC, where however entries of C and scalars λ0 in Jordan cells7→ may belong to a larger field F K.7 We will see how this works when K = R and F = C. ⊃ 7For this, F does not have to be algebraically closed, but only needs to contain all roots of det(λI − A). 144 Chapter 4. EIGENVALUES The Real Case Let A : Rn Rn be an R-linear operator. It acts8 on the com- plexification →Cn of the real space, and commutes with the complex conjugation operator σ : x + iy x iy. 7→ − The characteristic polynomial det(λI A) has real coefficients, − but its roots λi can be either real or come in pair of complex con- jugated roots (of the same multiplicity). Consequently, the complex root spaces , which are defined as null spaces in Cn of sufficiently Wλi high powers of A λiI, come in two types. If λi is real, then the root space is real in the− sense that it is σ-invariant, and thus is the com- plexification of the real root space Rn. If λ is not real, and λ¯ is Wλi ∩ i i its complex conjugate, then λi and λ¯i are different root spaces of A, but they are transformedW into eachW other by σ. Indeed, σA = Aσ, ¯ d and σλi = λiσ. Therefore, if z λi , i.e. (A λiI) z = 0 for some d ∈W ¯ d − d, then 0 = σ(A λiI) z = (A λiI) σz, and hence σz λ¯i . This allows one to− obtain the following− improvement for the∈ Jordan W Canonical Form Theorem applied to real matrices. Theorem. A real linear operator A : Rn Rn can is repre- sented by the matrix in a Jordan normal→ form with respect to a basis of the complexified space Cn invariant under com- plex conjugation.

Proof. In the process of construction bases in λi in which A has a Jordan normal form, we can use the following procedure.W When λ is real, we take the real root space Rn and take in it a real i Wλi ∩ basis in which the matrix of A λiI is block diagonal with regular nilpotent blocks. This is possible− due to Proposition applied to the case K = R. This real basis serves then as a σ-invariant complex basis in the complex root space λi . When λi and λ¯i is a pair of complex conjugated root spaces,W then we takeW a requiredW basis in one of them, and then apply σ to obtain such a basis in the other. Taken together, the bases form a σ-invariant set of vectors.  Of course, for each Jordan cell with a non-real eigenvalue λ0, there is another Jordan cell of the same size with the eigenvalue λ¯0. Moreover, if e1,..., em is the basis in the A-invariant subspace of the first cell, i.e. Aek = λ0ek + ek−1, k = 2,...,m, and Ae1 = λ0e1, then the A-invariant subspace corresponding to the other cell comes with the complex conjugate basis e¯1,..., e¯m, where e¯k = σek. The direct sum := Span(e ,..., e , e¯ ,..., e¯ ) of the two subspaces is U 1 m 1 m 8Strictly speaking, it is the complexification AC of A that acts on Cn =(Rn)C, but we will denote it by the same letter A. 2. Jordan Canonical Forms 145 both A- and σ-invariant and thus is a complexification of the real A-invariant subspace Rn. We use this to describe a real normal form for the action ofUA ∩on this subspace. Namely, let λ0 = α iβ, and write each basis vector ek in terms of its real and imaginary− part: e = u iv . Then the real vectors u k k − k k and vk form a real basis in the real part of the complex 2-dimensional space spanned by ek ande ¯k = uk + ivk. Thus, we obtain a basis n u1, v1, u2, v2,..., um, vm in the subspace R . The action of A on this basis is found from the formulas: U ∩

Au + iAv = (α iβ)(u + iv ) = (αu + βv )+ i( βu + αv ), 1 1 − 1 1 1 1 − 1 1 Au + iAv = (αu + βv + u )+ i( βu + αv + v ),k> 1. k k k k k−1 − k k k−1 Corollary 1. A linear operator A : Rn Rn is repre- sented in a suitable basis by a block diagonal→ matrix with the diagonal blocks that are either Jordan cells with real eigenvalues, or have the form (β = 0): 6 α β 1 0 0 . . . 0 0 β− α 0 1 0 . . . 0 0  0 0 α β 1 0 . . . 0  −  0 0 β α 0 1 . . . 0     . . .  .    0 . . . 0 α β 1 0   0 . . . 0 β− α 0 1     0 0 . . . 0 0 α β   −   0 0 . . . 0 0 β α      Corollary 2. If two real matrices are related by a com- plex similarity transformation, then they are related by a real similarity transformation. Remark. The proof meant here is that if two real matrices are similar over C then they have the same Jordan normal form, and thus they are similar over R to the same real matrix, as described in Corollary 1. However, Corollary 2 can be proved directly, without a reference to the Jordan Canonical Form Theorem. Namely, if two real matrices, A and A′, are related by a complex similarity trans- formation: A′ = C−1AC, we can rewrite this as CA′ = AC, and taking C = B + iD where B and D are real, obtain: BA′ = AB and DA′ = AD. The problem now is that neither B nor D is guaranteed to be invertible. Yet, there must exist an invertible linear combina- tion E = λB + µD, for if the polynomial det(λB + µD) of λ and µ 146 Chapter 4. EIGENVALUES vanishes identically, then det(B + iD) = 0 too. For invertible E, we have EA′ = AE and hence A′ = E−1AE.

EXERCISES 298. Let A, B : be two commuting linear operators, and p and q V →V two polynomials in one variable. Show that the operators p(A) and q(B) commute. 299. Prove that if A commutes with B, then root spaces of A are B- invariant.

300. Let λ0 be a root of the characteristic polynomial of an operator A, and m its multiplicity. What are possible values for the dimension of the eigenspace corresponding to λ0? 301. Let v be a non-zero vector, and a : K a non-zero linear ∈ V V → function. Find eigenvalues and eigenspaces of the operator x a(x)v. 7→ 302. Let K[x] be the space of all polynomials of degree < n. Prove Vn ⊂ that the differentiation d : is a regular nilpotent operator. dx Vn →Vn 303. Prove that a square matrix satisfies its own characteristic equation; namely, if p denotes the characteristic polynomial of a matrix A, then p(A) = 0. (This identity is called the Hamilton–Cayley equation.) 304. Is there an n n-matrix A such that A2 = 0 but A3 = 0: (a) if n = 2? × 6 (b) if n = 3? 305. Classify similarity classes of nilpotent 4 4-matrices. × 306. Prove that the number of similarity classes of unipotent n n-matrices × is equal to the number of partitions of n. 307. Find Jordan normal forms of the following matrices:  1 20 4 60 13 16 16 (a) 0 20 , (b) 3 5 0 , (c) 5 7 6 , " 2 2 1 # " −3 −6 1 # " −6 −8 −7 # − − − − − − − 3 0 8 4 2 10 7 12 2 (d) 3 1 6 , (e) −43 7 , (f) 3 − 4− 0 , " 2− 0 −5 # " −31 7 # " 2− 0 2 # − − − − 2 8 6 0 3 3 1 1 1 (g) −4 10 6 , (h) 1 8 6 , (i) 3 3− 3 , " −4 8 4 # " −2 14 10 # " −2 −2 2 # − − − − − − 1 1 2 1 1 1 3 7 3 (j) 3 −3 6 , (k) −5 21 17 , (l) 2 5− 2 , " 2 −2 4 # " −6 26 21 # " −4 −10 3 # − − − − − 8 30 14 9 22 6 4 5 2 (m) 6 19− 9 , (n) 1 4− 1 , (o) 2 2− 1 . " −6 −23 11 # " −8− 16 5 # " −1 −1 1 # − − − − − 2. Jordan Canonical Forms 147 308. Find all matrices commuting with a regular nilpotent. 309. Compute powers of Jordan cells. 310. Prove that if some power of a complex matrix is the identity, then the matrix is diagonalizable. 311. Prove that transposed square matrices are similar.

312. Prove that tr A = λi and det A = λi, where λ1, ˙,λn are all roots of the characteristic polynomial (not necessarily distinct). P Q 313. Find complex eigenvectors and eigenvalues of rotations on the Eu- clidean plane. 314. Classify all linear operators in R2 up to linear changes of coordinates. a b 315. Consider real traceless 2 2-matrices as points in the 3- × c a − dimensional space with coordinates a,b,c. Sketch the partition of this space into similarity classes. 148 Chapter 4. EIGENVALUES Hints

8. AA−−→′ =3−−→AM/2. 14. cos2 θ 1. ≤ 19. Rotate the regular n-gon through 2π/n. 22. Take X = A first. 23. −−→AB = −−→OB −→OA. − 24. If a + b + c + d = 0, then 2(b + c)(c + d)= a2 b2 + c2 d2. − − 25. XA2 = −−→OX −→OA 2 =2R2 2 −−→OX, −→OA if O is the center. | − | − h i 26. Consider projections of the vertices to any line. 27. Project faces to an arbitrary plane and compute signed areas. 28. Q( x, y)= Q(x, y) for a quadratic form Q. − − 31. AX2 + CY 2 is even in X and Y . 32. Q( x, y)= Q(x, y) only if b = 0. − 35. cos θ =4/5. 38. x2 +4xy = (x +2y)2 4y2. − 42. e = AB / AC is the slope of the secting plane. | | | | 43. Show that the sum of distances to the foci of an ellipse from points on a tangent line is minimal at the tangency point; then reflect the source of light about the line. 52. Compute (cos θ + i sin θ)3. 54. Divide P by z z with a remainder. − 0 57. (z z )(z z¯ ) is real. − 0 − 0 58. Apply Vieta’s formula. 67. Draw the regions of the plane where the quadratic forms Q> 0. 2 2 70. First apply the Inertia Theorem to make Q = x1 + x2. k m k 74. In each a b − , each a comes from one of m factors (a + b).

76. Take x1 = x, x2 =x ˙.

149 150 Hints

2 80. 3n(n+1) n2 = n(n+3) > n . 2 − 2 2 83. Write x = xiei. 94. Pick two 2 2-matrices at random. P× 97. Compute BAC, where B and C are two inverses of A. 1 1 1 1 98. Compute (B− A− )(AB) and (AB)(B− A− ). 99. Consider 1 2 and 2 1 matrices. 1× × 101. Solve D− AC = I for D and C. 104. Which 2 2-matrices are anti-symmetric? × 109. Start with any non-square matrix. 110. Each elementary product contains a zero factor. 113. There are 12 even and 12 odd permutations. 1 2 ... n − 1 n 115. Permutation has length n . „ n n − 1 . . . 2 1 « 2 1  117. Pairs of indices inverted by σ and σ− are the same. 118. The transformation: logarithm lagorithm algorithm consists → → of two transpositions. 123. 247 = 2 100+4 10+7. · · 127. det( A) = ( 1)n det A. − − 139. Apply the 1st of the “cool formulas.” 140. Compute the determinant of a 4 4-matrix the last two of whose × rows repeat the first two. 141. Apply Binet–Cauchy’s formula to the 2 3 matrix whose rows are × xt and yt.

142. Show that ∆n = an∆n 1 + ∆n 2. − − m 144. Use the defining property of Pascal’s triangle: k = m 1 m 1 k− + k −1 . −  145. Use the fact of algebra that a polynomial in (x1,...,xn), which vanishes when x = x , is divisible by x x . i j i − j 146. Divide the kth column by k and apply Vandermonde’s identity. 147. Redefine multiplication by scalars as λu = 0 for all λ K and all ∈ u . ∈V 148. 0u = (0 + 0)u =0u +0u. 149. ( 1)u + u = ( 1+1)u =0u = 0. − − 150. Use the Euclidean algorithm (see e.g. [3]) to show that for relatively prime k and p, the least positive integer of the form mk + np is equal to 1. 153. There is no 0 among polynomials of a fixed degree. π f 160. To a linear form f : / K, associate / K. V W → V →V W → Hints 151

162. Given B : ∗, show that evaluation (Bv)(w) of the linear V → W form Bv ∗ on w defines a bilinear form. ∈ W ∈ W 164. Translate given linear subspaces by any vector in the intersection of the affine ones. 166. A linear function on is uniquely determined by its restrictions V ⊕W to and . V W 171. Compute q1q2q2∗q1∗. 174. Introduce multiplication by scalars q H as A q∗A. ∈ 7→ 176. q 2 = 1 = 1. | | |− | 181. cos 4θ + i sin 4θ = (cos θ + i sin θ)4. 182. H : R7 R1. 184. The space→ spanned by columns of A and B contains columns of A+B. t n m 188. (A a)(x)= a(Ax) for all x K and a (K )∗. ∈ ∈ 196. Prove that in a finite field, multiples of 1 form a subfield isomorphic to Zp where p is a prime. 203. Modify the Gaussian elimination algorithm of Section 2 by permuting unknowns (instead of equations). 204. Apply the LUP decomposition to M t. 1 205. Apply LPU, LUP , and PLU decompositions to M − . 210. When K has q elements, each cell of dimension l has ql elements. 214. Consider the map Rn Rp+q defined by the linear forms. → 227. P = A + iB where both A and B are Hermitian.

228. x, y = x∗y. h i 230. z, w = z∗w. h i 231. Use the previous exercise. 233. The normal forms are: i Z 2 i Z 2, Z 2 Z 2, Z 2. | 1| − | 2| | 1| − | 2| | 1| 238. Take the linear form for one of new coordinates. 240. Prove that 1 is a non-square. − 243. There are 1 even and 3 odd non-degenerate forms. 247. 1 =1 2 . 3 − 3 248. Start with inverting 2-adic units 1. ( is a wild card). ···∗∗∗ ∗ 250. Use the previous exercise. 275. Find an eigenvector, and show that its orthogonal complement is in- variant. 277. Compute x, Ax for x = t v , where v is an orthonormal basis h i i i { i} of eigenvectors, and t2 = 1. i P 278. If AB = BA, then eigenspaces of A are B-invariant. P 293. Use the previous exercise. 152 Answers 309. Use Newton’s binomial formula. Answers

1. mg/2, mg√3/2. 2. 18 min (reloading time excluded). 6. It rotates with the same angular velocity along a circle centered at the barycenter of the triangle formed by the centers of the given circles. 10. 3/4.

13. 7−→OA = OA−−→′ +2OB−−→′ +4OC−−→′ for any O. 15. 3/2. 17. 2 u, v = u + v 2 u 2 v 2. h i | | − | | − | | 18. (b) No. 30. (a) ( α2 β2, 0), (b) ( α2 + β2, 0). ± − ± 2 2 33. Yes, k(px + y ). p 34. y = x; level curves of (a) are ellipses, of (c) hyperbolas. ± 35. 2X2 Y 2 = 1. − 38. 2nd ellipse, 1st & 4th hyperbolas. 39. A pair of intersecting lines, x 1= (2y 1). − ± − 44. Yes; Yes (0); Yes. 1+5i 1 i√3 45. (a) 13 ; (b) −2 . 2 47. z − . | | 48. i√3. ± 50. (a) √2, π/4; (b) 2, π/3. − − 1+i√3 51. − 2 . 1 √5 √6 i√2 55. 2 i; i ± ; 1+ i, 1+ i; 1 − . ± 2 ± 2 √3 i √3 i √3 i √3 i 56. 2, 1 i√3; i, ± − ; i√2, i√2; ± , ± ; i, ± . − ± 2 ± ± 2 2 ± ± 2 k 59. ak = ( 1) 1 i1<

153 154 Answers 63. Yes, some non-equivalent equations represent the empty set. 65. m = n = r = 2. 69. n(n + 1).

71. The ODEsx ˙ = ax andx ˙ = a′x are non-equivalent whenever a = a′. 6 77. X˙ = iX , X˙ = iX . 1 1 2 − 2 λt λt λt λt 78. x1 = c1e , x2 = c2e , and y1 = (c1 + c2t)e , y2 = c2e . n(n+1) 79. 2 . 81. yes, no, no, yes, no, yes, yes, no.

84. E = va, i.e. eij = viaj , i =1,...,m, j =1,...,n. cos θ sin θ 85. . sin θ −cos θ   86. Check your answers using associativity, e.g. (CB)A = C(BA). 3 1 0 2 Other answers: BAC = , ACB = . 3 3 3− 6    

cos 1◦ sin 1◦ 91. − . sin 1◦ cos 1◦   k 92. If B = A , then bi,i+k = 1, and all other bij = 0. 93. (a) If A is m n, B must be n m. (b) Both must be n n. × × × 95. Exactly when AB = BA. 96. Those with all a = 0. ii 6 102. No; yes (of x, y R1); yes (in R2). ∈ 103. I. 105. S =2x y + x y + x y , A = x y x y . 1 1 1 2 2 1 1 2 − 2 1 106. ( xi)( yi) and i=j xiyj /2. 6 107. No,P onlyP if AB = BAP . 111. 1; 1; cos(x + y). 112. 2, 1; 2. − 114. k(k 1)/2. − 116. n(n 1)/2 l. − − 119. E.g. τ14τ34τ25. 120. No; yes. 121. + (6 inverted pairs); + (8 inverted pairs). 122. 1522200; 29400000. − − 124. 0. 125. Changes sign, if n = 4k + 2 for some k, and remains unchanged otherwise. Answers 155

126. x = a1,...,an. 128. Leave it unchanged. n(n 1)/2 131. (a) ( 1) − a a a , (b) (ad bc)(eh fg). − 1 2 · · · n − − 132. (a) 9, (b) 5. 5 7 1 18 18 18 1 1 0 − 1 5 7 − 133. (a) 18 18 18 , (b) 0 1 1 .  7 − 1 5  " 0 0− 1 # 18 18 − 18 134. (a) x = [ 2 , 1 , 1 ]t, (b) x = [1, 1, 1]t. − 9 3 9 − n 1 136. (det A) − , where n is the size of A. 137. Those with determinants 1. ± 138. (a) x1 =3, x2 = x3 = 1, (b) x1 =3, x2 =4, x3 = 5. 139. (a) x2 x2 , (b) (ad bc)3. − 1 −···− n − n n 1 143. λ + a1λ − + + an 1λ + an. · · · − 144. 1. 146. 1!3!5! (2n 1)!. · · · − 152. pn; pmn. n(n 1)/2 155. p − . 157. Ker D = K[xp] K[x]. ⊂ 163. Points, lines, and planes. t t 168. Ker A = A( )⊥, A ( ∗) = ( / Ker A)∗ ∗. V W V ⊂V 170. 1; q. − − z w¯ 175. . w − z¯   176. bi + cj + dk b2 + c2 + d2 =1 . { | } 178. 2. 179. 2, if n> 1. 180. xk = xkL (x)+ + xk L (x), k =0,...,n 1. 1 1 · · · n n − 182. 1. 185. The system is consistent whenever b1 + b2 + b3 = 0. 187. (a) Yes, (b) yes, (c) yes, (d) no, (e) yes, (f) no, (g) no, (h) yes. 191. 0 codim k. ≤ ≤ 193. Two subspaces are equivalent if and only if they have the same di- mension. 194. The equivalence class of an ordered pair , of subspaces is deter- U V mined by k := dim U, l := dim , and r := dim( + ), where k,l r n V U V ≤ ≤ can be arbitrary. 156 Answers

n n n n 2 n n 1 195. (a) q ; (b) (q 1)(q q)(q q ) (q q − ); n n n − 2 −n r −1 · · · − (c) (q 1)(q q)(q q ) (q q − ); n − n − −n · ·r · 1 −r r r r 1 (d) (q 1)(q q) (q q − )/(q 1)(q q) (q q − ). − − · · · − − − · · · − 197. x =3, x =1, x = 1; inconsistent; x =1, x =2, x = 2; 1 2 3 1 2 3 − x =2t t , x = t , x = t , x = 1; 1 1 − 2 2 1 3 2 4 x = 8, x = 3+ t, x = 6+2t, x = t; x = x = x = x = 0; 1 − 2 3 4 1 2 3 4 x = 3 t 13 t , x = 19 t 20 t , x = t , x = t ; 1 17 1 − 17 2 2 17 1 − 17 2 3 1 4 2 x = 16+ t + t +5t , x = 23 2t 2t 6t , x = t , x = t , x = t . 1 − 1 2 3 2 − 1 − 2 − 3 3 1 4 2 5 3 198. λ = 5. 199. (a) rk=2, (b) rk=2, (c) rk=3, (d) rk=4, (e) rk=2. 200. Inverse matrices are: 1 1 1 1 2 1 0 0 1 4 3 − − − 1 1 1 1 1 3 200 1 5 3 , − − , − . − − 4  1 1 1 1   31 19 3 4  " 1 6 4 # − − − − − 1 1 1 1 23 14 2 3  − −   − −      n 208. 2 l(σ), where l(σ) is the length of the permutation. − n q 1 209. [n]q! := [1]q[2]q [n]q (called q-factorial), where [k]q := q −1 . · · · − 211. Inertia indices (p, q)=(1, 1), (2, 0), (1, 2). 216. Empty for p = 0, has 2 components for p =1, and 1 for p =2, 3, 4. 2 2 2 2 2 2 2 220. z1 + z2 =1,z1 + z2 =0,z2 = z1 ,z1 =1,z1 = 0. 221. Two parallel lines. 226. Those all of whose entries are imaginary. 227. P (z, w)= 1 [P (z + w, z + w)+ iP (iz + w,iz + w) (1 + i) (P (z, z)+ P (w, w))]. 2 − 234. dii = ∆i/∆i 1. − 235. (p, q)=(2, 2), (3, 1). 253. No. 254. The sum of the squares of all sides of a parallelogram is equal to the sum of the squares of the diagonals. 265. id. 10 0 2 0 0 3 0 0 307. (a) 02 0 (b) −0 1 0 (c) −0 1 1 " 0 0 1 # " 0 0 1 # " 0 0 1 # 2 1 0 − 2 00 1 0 0 10 0 (e) 0 2 1 (f) 0 1 0 (k) −0 0 1 (l) 0 i 0 . " 0 0 2 # " 0− 00 # " 0 0 0 # " 0 0 i # − Bibliography

[1] [2] D. K. Faddeev, I. S. Sominsky. Problems in Linear Algebra. 8th edition, Nauka, Moscow, 1964 (in Russian). [3] A. P. Kiselev. Kiselev’s Geometry / Book I. Planimetry. Adapted from Russian by Alexander Givental. Sumizdat, El Cerrito, 2006. [4] A. P. Kiselev. Kiselev’s Geometry / Book II. Stereometry. Adapted from Russian by Alexander Givental. Sumizdat, El Cerrito, 2008. [5] J.-P. Serre. A Course in Arithmetic. [6] Ronald Solomon. Abstract Algebra. Thomson — Brooks/Cole, Bel- mont, CA, 2002. [7] Berezin [8] Van der Waerden [9] Hermann Weyl. Space. Time. Matter.

157 Index

absolute value, 18, 67 Bruhat cell, 94 addition of vectors, 59 additivity, 5 canonical form, 24 adjoint linear map, 119 canonical projection, 65 adjoint map, 70, 78 Cartesian coordinates, 6 adjoint matrix, 50 Cartesian product, 62 adjoint quaternion, 67 category of vector spaces, 62 adjoint systems, 82 Cauchy, 54 affine subspace, 65 Cauchy – Schwarz inequality, 8, 117 algebraic number, 60 Cauchy’s interlacing theorem, 133 algebraically closed field, 143 Cayley transform, 134 annihilator, 69 change of coordinates, 34 anti-Hermitian form, 105 characteristic equation, 121 anti-Hermitian matrix, 103 characteristic polynomial, 137 anti-symmetric form, 37 Chebyshev, 75 anti-symmetric operator, 129 Chebyshev polynomials, 75 antilinear function, 103 classification theorem, 24 argument of complex number, 19 codimension, 80 associative, 34 cofactor, 49 associativity, 2 cofactor expansion, 49, 52 augmented matrix, 84 column space, 86 axiom, 59 commutative square, 78 axis of symmetry, 14 commutativity, 2 commutator, 114 B´ezout, 22 complementary multi-index, 52 B´ezout’s theorem, 22 complete flag, 92 back substitution, 84 completing squares, 12 barycenter, 7 complex conjugate, 17 basis, 3, 71 complex conjugation, 126 bijective, 63 complex multiplication, 130 bilinear form, 36 complex sphere, 102 bilinearity, 5 complex vector space, 60 Binet, 54 complexification, 126 Binet–Cauchy formula, 54 composition, 33 binomial coefficient, 30 conic section, 9 binomial formula, 30 conics, 100 block, 46 conjugate quaternion, 67 block triangular matrix, 46 coordinate Euclidean space, 128

158 Index 159 coordinate flag, 92 evaluation map, 63 coordinate quaternionic space, 68 even form, 110 coordinate space, 32 even permutation, 42 coordinate system, 3 coordinate vector, 32 field, 18, 60 coordinates, 3, 73 field extension, 64 Cramer’s rule, 51 field of p-adic numbers, 113 cross product, 58 finite dimensional spaces, 72 cylinder, 100 flag, 92 focus of ellipse, 10 Dandelin, 10 focus of hyperbola, 14 Dandelin’s spheres, 9 focus of parabola, 15 Darboux, 111 Darboux basis, 111 Gaussian elimination, 83 Descartes, 6 Gelfand’s problem, 136 determinant, 28, 41 Gram matrix, 134 diagonal, 39 Gram–Schmidt process, 107, 118 diagonalizable matrix, 143 graph, 65 differentiation, 63 Hamilton–Cayley equation, 146 dimension, 72 Hasse, 113 dimension of Bruhat cell, 95 head, 1 direct sum, 65 Hermite, 103 directed segment, 1 Hermite polynomials, 75 directrix, 15 Hermitian adjoint, 103, 119 discriminant, 20, 116 Hermitian adjoint matrix, 103 distance, 117 Hermitian anti-symmetric form, 103 distributive law, 18, 32 Hermitian dot product, 104 distributivity, 2 Hermitian form, 103 dot product, 5, 36 Hermitian inner product, 117 dual basis, 74 Hermitian isomorphic, 118 dual map, 70 Hermitian matrix, 103 dual space, 63 Hermitian quadratic form, 103 Hermitian space., 117 eccentricity, 15 Hermitian symmetric form, 103 eigenspace, 121, 137 Hilbert space, 117 eigenvalue, 26, 120, 137 Hirani, 15 eigenvector, 120, 137 homogeneity, 5 elementary product, 41 homogeneous system, 80 elementary row operations, 83 homomorphism, 63 ellipse, 10 homomorphism theorem, 66 ellipsoid, 133 hyperbola, 10 equivalent, 23 hyperplane, 82 equivalent conics, 100 hypersurface, 100 equivalent linear maps, 78 Euclidean inner product, 127 identity matrix, 34 Euclidean space, 127 identity permutation, 42 Euclidean structure, 127 imaginary part, 17 evaluation, 32 imaginary unit, 17 160 Index inconsistent system, 85 lower triangular, 38, 89 indices in inversion, 42 LPU decomposition, 89 induction hypothesis, 73 LU decomposition, 91 inertia index, 99 LUP decomposition, 95 injective, 62 inner product, 5 M¨obius band, 75 invariant subspace, 121 mathematical induction, 73 inverse matrix, 35 matrix, 31 inverse quaternion, 68 matrix entry, 31 inverse transformation, 35 matrix product, 32, 33 inversion of indices, 42 metric space, 117 involution, 126 Minkowski, 113 isometric Hermitian spaces, 118 Minkowski–Hasse theorem, 113 isomorphic spaces, 63 minor, 49, 52 isomorphism, 63 multi-index, 52 multiplication by scalar, 2 Jordan canonical form, 142 multiplication by scalars, 59 Jordan cell, 27, 142 multiplicative, 18 Jordan normal form, 142 multiplicity, 20 Jordan system, 27 nilpotent operator, 139 non-degenerate Hermitian form, 106 kernel, 62 non-negative form, 125 kernel of form, 100, 114 nontrivial linear combination, 72 norm of quaternion, 67 Lagrange, 52 normal form, 24 Lagrange polynomials, 75 normal operator, 119, 129 Lagrange’s formula, 52 null space, 62, 86 law of cosines, 6 LDU decomposition, 91 odd form, 110 leading coefficient, 84 odd permutation, 42 leading entry, 84 opposite coordinate flag, 93 leading minors, 106 opposite vector, 59 left inverse, 82 orthogonal, 118 left kernel, 69 orthogonal basis, 97 left singular vector, 125 orthogonal complement, 121 left vector space, 68 orthogonal diagonalization, 124 length, 117 orthogonal projection, 134 length of permutation, 42 orthogonal projector, 134 linear combination, 2 orthogonal transformation, 128 linear form, 32, 63 orthogonal vectors, 6 linear function, 32, 63 orthonormal basis, 107, 118, 127 linear map, 33, 62 linear subspace, 61 parabola, 13 linear transformation, 35 partition, 141 linearity, 32 Pascal’s triangle, 150 linearly dependent, 72 permutation, 41 linearly independent, 71 permutation matrix, 89 Linnaeus, 23 pivot, 84 Index 161 Pl¨ucker, 57 sesquilinear form, 102 Pl¨ucker identity, 57 sign of permutation, 41 PLU decomposition, 95 similarity, 137 polar, 19 similarity transformation, 35 polar decomposition, 136 simple problems, 29 positive definite, 98 singular value, 125 positive Hermitian form, 104 singular value decomposition, 125 positive operator, 135 skew-filed, 67 positivity, 5 span, 71 power of matrix, 39 spectral theorem, 120 principal axes, 14, 133 spectrum, 26, 132 principal minor, 113 square matrix, 34 projection, 118 square root, 21, 135 Pythagorean theorem, 6 standard basis, 33, 71 standard coordinate flag, 92 q-factorial, 95, 156 surjective, 63 quadratic curve, 9 Sylvester, 106 quadratic form, 11, 25, 37 symmetric bilinear form, 37 quadratic formula, 20 symmetric matrix, 38 quaternions, 67 symmetric operator, 129 quotient space, 65 symmetricity, 5 system of linear equations, 38 range, 62 rank, 25, 76 tail, 1 rank of linear system, 80 total anti-symmetry, 44 rank of matrix, 77 transition matrix, 35 real part, 17 transposed bilinear form, 36 real spectral theorem, 129 transposed partition, 141 real vector space, 60 transposition, 36 realification, 126 transposition matrix, 89 reduced row echelon form, 84 transposition permutation, 42 regular nilpotent, 139 triangle inequality, 8, 117 right inverse, 82 unipotent matrix, 91 right kernel, 69 unit vector, 5 right singular vector, 125 unitary rotation, 123 right vector space, 68 unitary space, 117 ring, 31 unitary transformation, 119 , 21 upper triangular, 38, 89 root space, 138 row echelon form, 84 Vandermonde, 58 row echelon form of rank r, 84 Vandermonde’s identity, 58 row space, 86 vector, 1, 59 vector space, 28, 59 scalar, 59, 60 vector sum, 2 scalar product, 5 Vieta, 22 semiaxes, 133 Vieta’s theorem, 22 semiaxis of ellipse, 13 semilinear function, 103 Young tableaux, 141 162 Index zero vector, 2, 59