Orthogonally Diagonalizable Matrices

These notes are about real matrices  matrices in which all entries are real numbers. Complex numbers will come up occasionally, but only in very simple ways as tools for learning more about real matrices.

Review

An 8 ‚ 8 E is called diagonalizable if we can write E œ T HT" where H is a 8 . This is possible if and only if there is a Ö," ß , # ß ... ß× , 8 for ‘ where the ,3 's are eigenvectors of EÞ The corresponding eigenvalues sit along the diagonal of H . and the matrix Tœ [," , # ... , 8 ]. Thus TœT U , the “change of " coordinates” matrix: TÒÓœUB U B and TU BB œÒÓÞ U

E “acts like” a diagonal matrix when we change coordinates: more precisely , the mapping BBÈE (in standard coordinates) is the same as ÒB ÓU È HÒ B Ó U (written in U-coordinates).

An is a for which Y" œ Y X ; equivalently , an orthogonal matrix is a square matrix with orthonormal columns.

Definition An 8‚8 matrix E is called orthogonally diagonalizable if there is an orthogonal matrix Y and a diagonal matrix H for which E œ Y HY" Ð œ Y HY X ÑÞ

Thus, an orthogonally is a special kind of diagonalizable matrix: not only can we factor E œ T HT " , but we can find an orthogonal matrix Y œ T that works. In that case, the columns of Y form an for ‘8 . We want to know which matrices are orthogonally diagonalizable. The that appears later in these notes will give us the answer.

Definition Eis called a if EœEÞX

Notice that a symmetric matrix E must be square (why ?).

Example If E is any matrix (square or not), then EEX is square. EE X is also symmetric because ÐEEÑXX œEE XXX œEEÞ X

The next result tells us that only a symmetric matrix “has a chance” to be orthogonally diagonalizable. This is the “easy half” of the Spectral Theorem.

Theorem If E is orthogonally diagonalizable, then E must be symmetric.

Proof Suppose that H is diagonal, Y orthogonal and E œ Y HY" œ Y HY X . Then EX œ ÐY HY XX Ñ œ Y XXXX H Y œ Y HY X œ E, so E is symmetric. ñ To completely understand which matrices are orthogonally diagonalizable, we need to know a bit more about symmetric matrices. For instance, a property that characterizes symmetric matrices is how nicely they interact with the dot product.

Theorem An 8‚8 matrix E is symmetric if and only if E†œ†EBCBC for all vectors B and C in ‘8

Proof i) Let Bß C be in ‘8 Þ For any 8‚8 matrix E

E†BC œÐEÑ BCBX œ XX E CB œ †E X C (*)

If E is symmetric, then EœEX and equation (*) becomes E†œ†EBC B C .

8 ii) Suppose EBCBC † œ † E for all vectors BC and in ‘ . Let ++" ,ÞÞÞß 8 be the columns of E . Then for all "Ÿ3ß4Ÿ8 ,

E†//34 œ +/ 34 †œ+43 œ the Ð4ß3Ñ entry in E . ll /3†E / 4 œ /+ 34 † œ+34 œ the Ð3ß4Ñ entry in E

so E is symmetric Þ ñ

The next result about symmetric matrices uses a few facts about complex numbers.

Review

A has the form Dœ+,3 where + and , are real and 3œ"Þ# For Dœ+,3, the conjugate of D is  Dœ+,3 . Clearly, D is a real number if and only if ,œ!, and this happens if and only if DœDÞ

‚ is the set of all complex numbers, and ‘© ‚ .

‚ # # For D−Þ the magnitude of DlDlœ is +, (a real number). Clearly___ DDœ+,œlDl # # # , and for every DßA−‚ , it is easy to check that DA œ D A Þ  For a matrix EE with complex entries, denotes the conjugate matrix where each + in E __  34 has been replaced by +E34 . So is a real matrix if and only if EœE

We also use the Fundamental Theorem of Algebra (a much deeper result! ).It tells us that if we allow complex numbers, then every polynomial factors completely into linear factors. In particular, every characteristic polynomial GÐ- Ñ factors completely as GÐÑœÐ- ----" ÑÐ # ÑÞÞÞÐ -- 8 ÑÞ Therefore every 8‚8 matrix E has 8 eigenvalues if we count by multiplicities  for example, if the factor ÐÑ- - 3 repeats exactly three times, then -3 counts as three eigenvalues. Here is the next important fact about symmetric matrices.

Theorem If E is a(real) 8‚8 symmetric matrix, then E8 has real eigenvalues ( counted by their multiplicities Ñ. For each eigenvalue, we can find a real eigenvector associated with it.

Proof According to the Fundamental Theorem of Algebra, E has eigenvalues -" , ÞÞÞß - 8 (possibly with some duplicates listed because we count by multiplicities). Because E is symmetric, we will show that each -3 must be a real number.

D" First, notice that for any complex vector Dœã −ß‚8 the ;œE D X D is a real D8 number because ;œ  ; À

 ;œDDDX Eœ†EœE† D DDDDDDDDœ œ†EœX Eœ X E  ; Å Å because E is symmetric because E is real

D" Let -3 be an eigenvalue and let D œ ã be one of its eigenvectors. D8

X  X  X  Then DDEœ DD-3 œ - 3 DD œÐ†Ñ - 3 DD

   # # œ -3""## ÐD †D D †D ÞÞÞD 88 †DÑœ - 3" ÐlDl ÞÞÞlDlÑ 8 .

X # # But DE D is real and, on the right side of the equation, ÐlD" l ÞÞÞlD 8 l Ñ is both real X # # and nonzero (why? ). Therefore -3œD E D ÎÐlDl " ÞÞÞlD 8 lÑ is real.

Since each --3 is real, EM 3 is a real matrix and det ÐEMÑœ! --3 because 3 is an eigenvalue. So the real matrix equation ÐEMÑœ-3 B ! has nonzero real solutions Þ In other words, there are real eigenvectors for eigenvalue -3Þ ñ

We are now ready to prove our main theorem. The set of eigenvalues of a matrix is sometimes called the spectrum of the matrix, and orthogonal diagonalization of a matrix E factors E in a way that displays all the eigenvalues and their multiplicities. Therefore the theorem is called the Spectral Theorem for real symmetric matrices.

The Spectral Theorem A (real) 8 ‚ 8 matrix E is orthogonally diagonalizable if and only if E is symmetric.

Earlier, we made the easy observation that if E is orthogonally diagonalizable, then it is necessary that E be symmetric. The Spectral Theorem says that the symmetry of E is also sufficient : a real symmetric matrix must be orthogonally diagonalizable. This is the part of the theorem that is hard and that seems surprising because it's not easy to see whether a matrix is diagonalizable at all.

This is a proof by induction, and it uses some simple facts about partitioned matrices and change of coordinates. Proof The proof is already half done. We only need to show that a (real) 8 ‚ 8 symmetric matrix E is orthogonally diagonalizable.

This is obviously true for every "‚" matrix EÀ if EœÒ+Ó , then EœÒ"ÓÒ+ÓÒ"ÓœYEYX Þ Þ Assume now that

(**) every Ð8"Ñ‚Ð8"Ñ symmetric matrix is orthogonally diagonalizable.

We will show that (**) forces it to be true that every 8 ‚ 8 symmetric matrix (“the next size up”) must also be orthogonally diagonalizable.

If we can do this, we will have finished a proof by induction: because the theorem is true whenever E"‚" is , then it must also be true whenever E#‚#,?> is ; then, because it is true whenever E#‚#ß is it must also be true whenever E$‚$ is ; but then, because it is true whenever E$‚$ is , it must also be true whenever E%‚%à is but then ... and so on, up to any size 8 ‚ 8Þ

Consider an 8‚8 symmetric matrix E where 8/"Þ By the preceding theorem, we can find a real eigenvalue -" of E , together with a real eigenvector @" Þ By normalizing, we can 8 assume @" is a unit eigenvector. Add vectors to extend Ö@ " × to a basis for ‘ and then use 8 the Gram Schmidt process to get an orthonormal basis for ‘À U œÖ@" ßÞÞÞß @ 8 ×Þ

Let T œ TU œ Ò@" @ # â @ 8 Ó œ the change of coordinates matrix for U . Because T is orthogonal, TœTÞ"X Now look at the matrix TET " .

ñTET " is symmetric, because ÐTETÑœÐTETÑœTET" X X X XXXX œTX ETœT " ET , and

" " " ñ its first column is TETœTEœT/" @ "-" @ "

" -" " ! ! œT-"@" œÒÓœ - " @ " U - " œ ã ã ! !

- ! Using the symmetry, partition T" ET as a “” " , where ! is a block ! F with 8  " zeros, and F is a symmetric matrix. Then F has size Ð8"Ñ‚Ð8"Ñ , so our assumption (**) says that F is orthogonally diagonalizable: there is a diagonal matrix H w and an Ð8"Ñ‚Ð8"Ñ orthogonal matrix U for which FœUHUw" , or UFUœHÞ " w

In the next set of calculations, you can check that the partitions of the matrices are sized so that each multiplication by blocks is defined: the column partition of the first matrix matches the row partition of the matrix to its right.

"0 " 0 " 0 " ! Define a partitioned 8 ‚ 8 matrix V œ . Since " œ , 0U 0 U 0 U ! M " " 0 we see that V is invertible: Vœ" Þ Let YœTV and 0 U Y is orthogonal ( explain why a product of orthogonal matrices is orthogonal!)

Then Y" EYœÐV "" T ÑEÐTVÑ

- - " "!" 0 " ! " 0 œV Vœ " !F 0 U ! F 0 U Å T" ET

-"!" 0 - " ! -" ! œ" œ " œ w !UF 0 U ! UFU ! H - - " !" " ! so E œYw YÞ Since Hœ w is a diagonal matrix, we !H ! H

have an orthogonal diagonalization of E À E œ Y HY" œ Y HY X ñ

The Spectral Decomposition of a Real Symmetric Matrix

If E is a real 8‚8 symmetric matrix, then we can orthogonally diagonalize

-" ! â ! - X !# â ! X E œ Y HY œ Y Y where YœÒ ?" ? #â ? 8 Ó ã ã ä ã - ! ! â 8

is an orthogonal matrix and the -3 's are the eigenvalues corresponding to the ?3's. So

- X X " ! â ! ?" ? " - X X !# â ! ?# ? # EœÒ???" #â 8 Ó œÒ -" ???" - # #â - 8 8 Ó ã ã ä ã ã ã ! ! â - X X 8 ?8 ? 8 X ?" X ?# X X X œÒ ---"#8??"#8â ? Ó œ -- "# ???? "#â ?? - 8 8 (***) ã " # 8 X ?8

‘8 X The subspaces SpanÖ?3 × are orthogonal straight lines through ! in . Each 8 ‚ 8 matrix ??3 3 8 is the projection of ‘ Ð the column space of EÑ onto Span Ö ?3 × .

So (***) says that E can be written as a linear combination of projections onto 8 orthogonal “axes” in ‘8Þ This linear combination uses all the eigenvalues (the spectrum) of E as weights, so (***) is called the spectral decomposition of E. We can see this in detail by applying equation (***) to a point B in ‘8 . We get

-X - X - X EBœ" ??B"" # ??B # # â  8 ??B 8 8

œ-" ??†B"" - # ??†B ##â  - 8 ??†B 8#

B?† " B† ? # B† ? 8 œ-" ?" - # ? #â - 8 ? 8 ?"" †? ? ## †? ? 88 †?

- - - œ"proj?" B # proj ? # Bâ 8 proj ? 8 B

In other words, we can think of how the transformation BÈ E B “works” by saying:

Rewrite B in U coordinates, where U œÖ ?" ßÞÞÞß ? 8 × For each 3 , U coordinate corresponding to the “axis” ?3 is rescaled by the factor - 3 The result is EB written in U- coordinates Convert back to standard coordinate version of EB if you like.

In a diagram

B Ò E BœB Y HY "

convert to U Æ Å convert back to standard coordinates coordinates rescale U coordinates ÒB ÓU Ò HÒ B Ó U ll ll Y"B H Y " B

E “acts like” a diagonal matrix H working in the orthonormal coordinate system with axes established by the vectors ?"ß ÞÞÞß ? 8 .

Notice that the diagram also indicates how we can think of BÈ E B when we have an “ordinary” diagonalization E œ T HT " . But in the ordinary case, the new coordinate axes (determined by the columns of T ) may not be orthogonal. Orthogonal diagonalization gives a new orthogonal coordinate system in terms of which we can “picture” the transformation clearly.

How to orthogonally diagonalize a symmetric matrix

Now we know exactly which 8 ‚ 8 (real) matrices are orthogonally diagonalizable, but the proof of the Spectral Theorem doesn't give us a very effective way to actually do the calculations. Fortunately, the following fact makes the actual computation of an orthogonal diagonalization very similar to the steps we used for any other diagonalization.

Theorem For a symmetric matrix E , two eigenvectors from different eigenspaces must be orthogonal (so any two eigenspaces are orthogonal and, of course, have only the vector ! in common).

Proof If @" and @ # are from different eigenspaces, they have different eigenvalues -"Á - # Þ Then -"@@"#†œE†œ†Eœ† @@@@@ "#"#" -# @ # œ - # @@ "# † , so Å because E is symmetric Ðц--"#@@"# œ!Þ But ÐÑÁ! -- "# , so @@" † # œ!Þ ñ

To orthogonally diagonalize an 8‚8 symmetric matrix Eß we can:

ñ Find the eigenvalues. The fact that E is symmetric doesn't really help much. As for any square matrix, finding the eigenvalues might be difficult. In a practical problem it will probably require computer assistance.

ñ The sum of the dimensions of the eigenspaces must be 8 because we know E is (orthogonally)diagonalizable. Find a basis for each eigenspace, and then use the Gram Schmidtprocess, as needed, to convert each basis to an orthonormal basis.

ñ Unite these orthonormal bases into a single collection U œÖ?" ßÞÞÞß ? 8 × with 8 vectors. Any two of these vectors ?3 and ? 4 are orthogonal

 because of the Gram Schmidt construction if ?3 and ? 4 are from the same eigenspace, or

 because of the preceding theorem, if ?3 and ? 4 are from different eigenspaces.

8 Therefore U is an orthonormal basis for ‘ . The matrix YœÒ?" ? # ÞÞÞ ? 8 Ó can be used to write an orthogonal diagonalization of E . " " " Example 1 Orthogonally diagonalize E œ " " " and write its spectral decomposition. " " "

Find the eigenvalues : The characteristic polynomial is

"- " " det "Ð"Ñ- " œÐ"-- Ñ Ð" Ñ# " "Ð" - "Ñ"Ð" " - Ñ - " "Ð"Ñ

œÐ"- ÑÐ" - Ñ# "  ------ œ Ð" ÑÐ# Ñ# œ- Ð" - ÑÐ# - Ñ# œ --- Ð# $Ñœ -- # Ð $Ñ ,

so the eigenvalues are -œ! (multiplicity #Ñ and - œ$ (multiplicity " ).

Find orthonormal bases for the eigenspaces:

For - œ!À Solve ÐE!MÑ Bœ E B œ !Þ The

" " " ! 1 1 1 0 " " " ! row reduces to 0 0 0 0 so the solutions are " " " ! 0 0 0 0

B" " " B œ B# œB# " B $ ! . A basis for the eigenspace B$ ! "

" " is " , !  ! "

For - œ$ÀSolve ÐE$MÑ Bœ ! . The augmented matrix

# " "! "!"! "# "! row reduces to !""! so the solution is " "#! !! !!

B" " " B œ B# œ B $ ". A basis for the eigenspace is "  B$ " "

" ÐNotice that " is orthogonal to each basis vector for the other eigenspace as must " be true for a symmetric matrix. Uniting these bases of the eigenspaces would give a basis of eigenvectors for ‘$ that we could use to diagonalize E . But this would not orthogonally diagonalize E because, for example, the basis vectors

" " " and ! are not orthogonal. We still need to find an orthonormal ! "

basis for each eigenspace.)

Find orthonormal bases

For - œ 0 Applying Gram Schmidt to the base gives an orthogonal basis

"  " "  '  "  # # " " , " ß so an orthonormal basis is " ,  # # ' ! "  # ! '

" $ " For - œ $ , normalizing gives an orthonormal basis Þ $ " $

Orthogonal Diagonalization of E À

"""  "" ! # ' $ ! ! ! # # """ ""# E œ Y HYX œ  ! ! !   # '$ ''' ! #"! ! $ """ '$ $$$

Spectral Decomposition of E À

X X X X E œ ! ??""! ?? # # $ ?? $ $ œ$ ?? $ $ . This shows us exactly how the transformation BÈ E B works: " $ " " project B onto the line Span$ œ Span" ß and then multiply by $   " " $ A degree two term using the variables BßÞßßßÞB" 8 means a term BBÞ 34 Examples of degree two # $ # terms are BB"$"" or BBœBß" but BBB1 , " # , or BBB "#$ are not degree two terms. A in 8 variables is a function UÐBß=ßÞÞÞßBÑ" # 8 that is a linear combination of degree two terms using B" ,..., B 8 . Such a quadratic form can always be written in the form

B" B BX E B , where B œ # and E is a 8‚8 symmetric matrix. ã B8 ÐSee details in the textbook. The next example illustrates what happens. Ñ

Example 2 A quadratic form in 3 variables À

# # # UÐB ÑœUÐBßBßBÑœB"#$" B # B $ #BB "#"$#$ #BD #BB ( Note that the quadratic form UB Ñ is not a linear function on ‘$

B" Using Bœ B# ß we can write UÐ B Ñ œ B$ " " " X ### UÐBßBßBÑ" # $ œ B" " " B œ B" B # B $ #BB"# #BB "$ #BB #$ , " " " as you can check directly. How was the matrix E created? The entries + 33 are the coefficients # (possibly ! ) of the terms B3 in UÐB Ñà the coefficient of a “cross-product” term BB3 4 in UÐB Ñ is “split in half” to form the two entries +34 and + 43 in EÞ From Example 1,

"""  "" ! # ' $ ! ! ! # # """ ""# E œ  ! ! !   œ Y HY X # '$ ''' ! #"! ! $ """ '$ $$$

A change of coordinates now lets us understand the quadratic form much better: the columns of $ Yare an orthonormal basis U œÖ×???"ß # ß $ for ‘ . For notational convenience, we will use C

C" to describe U -coordinates: ÒBÓU œ C œ C# . The change of coordinate matrix YU œ Y relates C$ the old and new coordinates: BœYÒU B ÓÑœY U C Þ When we make this change of coordinates # # # we get UÐB Ñ œB" B # B $ #BB"# #BB "$ #BB #$

œ BBXE œÐY C ÑEÐY X CC Ñœ XX Y EY CCC œ X H

!!! C " # # # # œÒC" C # C $ Ó !!! C # œ!C" !C # $C $ œ$C $ !!$ C $ In the new coordinates, all the cross-product terms like B" B # in the quadratic form have # # # disappeared and only a linear combination of “pure” terms C" ßC # ßC $ remains. ( In this # # - - particular example, CC" and # also drop out because " œœ! # , as read from the diagonal of H.)

You can imagine that we took the standard B" B # B $ axes and repositioned them (still orthogonal $ to each other) in ‘ to set up a new CCC" # $ coordinate system. Consider a geometric point T in ‘$ with no coordinates assigned for TÞ What is the value of the quadratic form U at this point? A formula to find the value of U at the point T depends on the coordinates we choose:

In standard coordinates: the “coordinate name” of T is ÐBßBßBÑß" # $ and

# # # UÐTÑ œ UÐB" ßB # ßB $ Ñ œ B" B # B $ #BB"# #BB "$ #BB #$

In U-coordinates : the “coordinate name” of that point T now is ÐCßCßCÑ" # $ and

# UÐTÑ œ $C $

Be sure you understand that the numeric value of U at T does not change. # # # # B" B # B $ #BB"# #BB "$ #BB #$ œ $C $ because of how the coordinate systems are related. It is the formula to evaluate U at T that changes when the coordinate system changes.

One coordinate system may give us better insight. For example, switching into the new C coordinates makes it clear that there are infinitely many points where the quadratic form has value ! for example, at every point T with C coordinates ÐCßCßCÑœÐ-ß.ß!Ñ" # $ . In C coordinates, the formula also makes it clear that this quadratic form never has a negative value . Example 2 illustrates some general facts about quadratic forms.

Every quadratic form in 8 variables can be written in form UÐÑœEB BX B where

B" B B œ # and E is an 8‚8 symmetric matrix. E can be orthogonally diagonalized ã B8

as Y HYX , where the columns of Y are an orthonormal basis U for ‘ 8 and their corresponding eigenvalues are listed along the diagonal of the diagonal matrix H . If we write U -coordinates as

C" C CBCœ# , then the substitution œY converts the formula for evaluating U into C ã C8

coordinates:

XX X- # - # UÐÑœC C YEY CœC H Cœ " CÞÞÞ" 8 C 8

This fact is called the Principal Axes Theorem for quadratic forms: there is a new orthogonal coordinate system in which the formula for U contains no mixed terms. When U -# - # is written in this form UÐC Ñ œ "C" ÞÞÞ 8 C 8 , then we see important information that comes from the eigenvalues of E .

If every eigenvalue -3 / ! , then U is positive at every point except the origin Cœ ! , and we say that the quadratic form is positive definite.

Similarly, if every eigenvalue -3 > ! , then U is negative at every point except the origin Cœ ! , and we say that the quadratic form is negative definite. If every eigenvalue -3 ! , then U is always 0, but there may be many points where Uœ!Ðsee Example 2) Ñ. When this happens, U is called positive semidefinite.

If every eigenvalue -3 Ÿ! , then U is always Ÿ 0, but there may be many points where Uœ!Ðsee Example 2) Ñ. When this happens, U is called negative semidefinite.

If E has an eigenvalue -3 /! and also another - 4 >!U , then will have some positive values and some negative values. U is called an indefinite quadratic form . For example, # # if UÐÑœCCC " # and -/! , then UÐ-ß!Ñ is always positive and UÐ!ß-Ñ is always negative.

Example 3 We have seen examples earlier in the course where a coordinate change simplifies the appearance of an equation. For instance, when simplified in the right way, it's easy to see that the set of points in ‘# that satisfy the following equation is an ellipse:

&B# %BC&C # œ'$ (*****)

In general, when E is an 8‚8 symmetric matrix, we say that an equation

UÐB Ñœ BX E B œ-

describes a quadric surface in ‘8Þ The ellipse is a quadric “surface” in ‘ # Þ

1 w w We saw earlier in the course that a of axes by % creates a new coordinate system B , C in which the equation (*****) becomes

$ÐBÑw# (ÐCÑ w# œ'$

This is recognizable as an ellipse where the major and axes of the ellipse lie on the Bw and Cw axes. In our earlier look at this example, we did not describe how we found that particular change of coordinates.

Since the left side of (*****) is a quadratic form, the Principal Axes Theorem suggests how to

B eliminate the cross-product terms. Using the notation B œ , we can write C

&  # &B# %BC&C # œ B X B  # &

The eigenvalues of the matrix are -œ$ and - œ( . Since each eigenspace is one dimensional, it

" " # # is easy to find an orthonormal basis for each of themÀ " and " . So we   # # can orthogonally diagonalize E as

"" "" &# $! œ E œ Y HYX œ ## ## #&"" !(  "" ## ##

w w B w If we let Bœw and make the substitution B œ Y B then, as in Example 1 (and as promised C by the Principal Axes Theorem) we get that

&B# %BC&C # œ'$

is the same as $ÐBÑw# (ÐCÑ w# œ'$

"  " The new coordinate axes Bw and C w are orthogonal lines that contain the vectors and " "

(the columns of Y , but rescaled for neatness). It is clear ( draw the new axes in the

w w 1 picture! ) that the BC - axes are the result of rotating the original B - C axes by % Þ

Example 4 This example gives us a chance to think look at the significance of orthogonal diagonalization in a different way. It is based on an earlier example from the course, partly reproduced here, in which we looked at the linear transformation XÀÄ‘# ‘ # that reflects ‘ # " across the line CœB# and found the matrix for which XÐÑœEÞB B

" XÐÑœ"†B Bif and only if B is on the line CœB# , so we see (geometrically) that this line is an eigenspace corresponding to eigenvalue -" œ" . Similarly, XÐÑœÐ"ÑB B if and only if B is on the perpendicular line C œ #B , so that line is an eigenspace corresponding to eigenvalue -# œ "Þ Explain why there are no other eigenvalues/eigenspaces.

# " & & It's easy to give orthonormal bases for these eigenspacesÀ " and # Þ   & &

This information is enough to immediately write down E in orthogonally diagonalized form:

#" #"  " ! E œ Y HYX œ && && . "#!  "  "# && &&

The primary point of this example is to think about how the geometry is “exhibited” in the orthogonal diagonalization: the matrix Y “shows” you to the new coordinate axes, and H shows you how BÈ E B rescales the coordinates along those axes. We can multiply to get a single matrix E but looking at the orthognoally diagonalized form should tell you more than looking at !Þ' !Þ) E œ . !Þ) !Þ' Constrained Maxima and Minima of Quadratic Forms

Example 5 In general, a quadratic form UÐB Ñ defined on ‘8 might have values that are " # arbitrarily large or small. For example, consider UÐB ÑœB# %BCC # œ B X B . #  " + For a point B œ moving out the positive B -axis, UÐÑœ+Ä∞ÞB # Similarly, if ! ! B œ ßthen UÐ BÑœ+Ä∞# as B moves up the positive C -axis. +

In this example, the matrix E for the quadratic form has a positive eigenvalue & and a negative eigenvalue   & Þ What can you say about the values of a quadratic form U if, say, all eigenvalues are nonnegative?

An application might be set up so that we are only interested in the values UÐÑB for the B ' = in “the unit circle”œWœÖ−"B‘ # Àllllœ"× B . In that case, U has a maximum and a minimum value : among B 's with ll B ll œ " ,

ñ the maximum value of UÐÑœ&œB largest eigenvalue of E ñ the minimum value of UÐÑœ&œB smallest eigenvalue of E . Where exactly on W " do these maximum and minimum values occur?

" UÐB Ñ œ & at those points B where the eigenspace for & intersects W (that is, for B 's with unit length in this eigenspace) and

" UÐB Ñ œ  & at those points B where the eigenspace for  & intersects W (that is, for B 's with unit length in this eigenspace)

In this example, the eigenspaces are one-dimensional (straight lines through !Ñß so UÐ B Ñ reaches its maximum value at the two diametrically opposite points on the unit circle W " , and similarly for the minimum value.

Can you find the maximum and minimum values for UÐB Ñ on W " and where they occur, using calculus but no ? To visualize this:

We verify these statements by proving a theorem that describes what happens in general. always happens. The proof uses only linear algebra (no calculus).

In the theorem, the unit sphere in ‘8 refers to the set of all points that are distance " from the origin. We write the unit sphere as W8" œÖ−B‘ 8 Àllllœ"× B . For example, W # is the unit sphere in ‘$  visualize it as the surface of a basketball centered at the origin.

Theorem Let UÐÑœEB BX B be a quadratic form, where E is a symmetric 8‚8 matrix and 8 B −Þ‘ Suppose the eigenvalues of E , listed by size, are -8 ŸâŸŸ - # - " . Then

8" a) -8ŸUÐÑŸB - " for all B in the unit sphere Wß and

8" b) UÐÑœB-3 for all B in the intersection of W and the eigenspace for - 3 (that is, for all B in the eigenspace with ll B llœ"ÑÞ In particular,

8" ñ the maximum value -" of UÐÑWB on happens at those B in the eigenspace 8" for-" that are also on W ß and

8" ñ the minimum value -8 of UÐÑWB on happens at those B in the eigenspace 8" for-8 that are also on =

Proof Since E is symmetric, there is an orthogonal matrix T such that

-" ! â ! !- â ! E œ Y HYX œ Y# Y X Þ ! ã ã ! - ! ! â 8

YY and X are both orthogonal so multiplication preserves length that is: if B œ Y C or, equivalently CBBCœYX , then llllœllll . Therefore the change of coordinates BCœ Y doesn't change which points on the unit sphere: llllœ"B if and only if llllœ"Þ C

a) For any B in W 8 " , we calculate as before that

-" !â! - " !â! !- â! ! - â! UÐÑœEœÐYÑBBBCXX # Y CœCX Y X # Y C !ãã! !ãã! - - ! ! â8 ! ! â 8

-" ! â ! - X !# â ! # # # œC C œCCâCß-" - - so ! ã ã ! "# # 8 8 - ! ! â 8

--## -- ### # - UÐÑŸB " C"#" Câ " C8 œ" ÐCCâCÑœ "# 8 "

8" because llllœ"ÞC Similarly-8 ŸUÐÑ B for any B in W Þ

8" b) If B is in W and B is an eigenvector for -3 , then

XX X # UÐÑœB BBBB Eœ--33 œ BB œ - 3 llllœ B -- 33 †"œ Þñ

Example 6 Consider again the quadratic form

" " " X ### UÐBßBßBÑ" # $ œ B" " " B œ B" B # B $ #BB"# #BB "$ #BB #$ " " " where we saw that

"""  "" ! # ' $ ! ! ! # # """ ""# E œ  ! ! !   œ Y HY X # '$ ''' ! #"! ! $ """ '$ $$$

On W# œ the unit sphere in ‘ $ , the maximum and minimum values of U are $ and ! . ñ The eigenspace for the largest eigenvalue, $ , is a straight line through ! in the direction of " the vector " . The eigenspace intersects W# in two points  the two endpoints of a " diameter of W# , and U has its maximum value, $ , at those two points.

ñ The eigenspace for the smallest eigenvalue, !ß is two dimensional  a plane through the origin perpendicular to line that is the other eigenspace. This eigenspace intersects W # in a circle (a “great circle” on WÑÞU# has its minimum value, ! , at all the points lying on this circle.

ñ For any other B 's on W# , we know that !>UÐÑ>$ B .

Example 7 Consider the quadratic form U in Example 6 and the new system of U -coordinates, where U is the basis formed by ?"ß ? # ß ? $ (the columns of Y ).

What is the maximum value of U subject to the two constraints BB †œ"†œ! and B? $ ? The constraint BB† œ " means that we consider only vectors B on the unit sphere W # in ‘ $ .

Since Y and Y X are orthogonal, inner products are preserved as we move back and forth between standard coordinates and U -coordinates so B † ? $ œ! if and only if C" ! C " ÒB? ÓUU † Ò$ Ó œ C# † ! œ ! which means that ÒB ÓU œ C # , that is, B has C$ C$ " ! coordinate ! .

So both constraints together mean that we are considering only B 's on the unit sphere with # C!$ coordinate , that is, the B 's on the great circle where W intersects the C" C # plane.

For such an B À the change of coordinates Bœ Y C shows that the value of U is

-## -- # --- ## ## - UÐÑC œ""#$ ÐCÑ C# Ð!Ñ Ÿ1 C " 1 Cœ2 " ÐCCÑœ "# "

C" since llB llœll C llœ C2 œ " !

So subject to the constraintsß the maximum value of Uœ$ is -" (the largest eigenvalue remaining after-$ , the eigenvalue corresponding to ?$ is eliminated)axis) is eliminated).

A similar argument shows that the minimum value of U subject to these two constraints is ! (the smallest eigenvalue remaining after -$ , the eigenvalue corresponding to ?$ is eliminated)axis) is eliminated).

# Explain why U! is constantly everywhere on the circle where W intersects the CC# $ plane. Example 8 Suppose UÐÑœEB BX B , where E is symmetric and &‚& symmetric, with

-" ! ! ! ! !-# !!! X eigenvalues -&ŸŸŸŸÞ - % - $ - # - " Write EœY!!-$ !! Y where Y !!!-% ! - ! ! ! ! &

& is an orthogonal matrixand let U œÖ?" ?? #$ ? % ? & × be a new orthogonal basis for ‘ Þ

After substituting Bœ Y C we get that U has value

-# - # - # - # - # UÐÑœC " C"# C # $ C $ % C % & C &

X Arguing as in Example 7, constraint= BBBBœ†œ"߆ B?" œ!, and B? † 5 œ! would restrict us to those vectors B on the unit sphere W % in ‘ & that have C coordinates % C" œ C & œ !  in other words, to the points in the intersection of W with the subspace spanned by ?#ß ? $ and ? % . And, mimicking the argument in Example 7, the maximum and minimum values for U under these constraints will be -# and - % Ð the largest and smallest of the eigenvalues that remain after -" and - & are eliminated).