Operator Theory: Advances and Applications, Vol. 259, 533–559 c 2017 Springer International Publishing

Commutator Estimates Comprising the Frobenius Norm – Looking Back and Forth

Zhiqin Lu and David Wenzel

A tribute to Albrecht B¨ottcher on his 60th birthday

√ Abstract. The inequality XY −YX F ≤ 2 X F Y F has some history to date. The growth of the task will be highlighted, supplemented by a look at future developments. On this way, we meet different forms and give an insight into various consequences of it. The collection of results will be enriched by introductive explanations. We also cross other fields that are important for theory and applications, and even uncover less known relationships. Subject Classification (2010). Primary 15A45; Secondary 15-02. Keywords. Commutator, Frobenius norm, BW and DDVV conjectures.

1. Revelations We want to take the opportunity of Albrecht B¨ottcher’s sixtieth birthday to look at one particular topic that started with a vague idea about a dozen years ago. First, we had a conjecture, this conjecture was proven after quite a while, and several follow-ups extended the problem thereafter. And who, if not two of the guys responsible for developing completely dif- ferent proofs, could give you some insight into this story and an overview of the achievements. There we are: Albrecht’s long-time research associate David Wen- zel, who went this path together with him; and Zhiqin Lu, coming from geometry and opening a (to most of us) new perspective. We thank Koenraad Audenaert for some entry points into physics that unveil links to a connatural field. At first, let’s go back to the beginning. Assume two square matrices X, Y are given. We all know that, occasionally, XY and YX do not coincide. But how different can they be? For measuring the distance between them, we can take a look onto XY − YXF ,where·F is the easily computable Frobenius norm.

The author Z. Lu is partially supported by an NSF grant DMS-1510232.

[email protected] 534 Z. Lu and D. Wenzel

The object inside is the famous commutator1. What’s the deal? It can’t be hard! Just using the triangle inequality and the sub-multiplicativity, one clearly has

XY − YXF ≤ 2XF Y F . (1.1) You’re right. But, the problem is that we were unable to find only one exam- ple in which actually equality holds.2 The best (meaning “biggest” in our case) we got even from extended experiments with a computer – operating√ systematically or at random, no matter the size taken – was a factor 2 instead of the trivially obtained 2. A claim was born. The one that it’s all about.

Theorem 1.1 (B¨ottcher–Wenzel conjecture). Suppose n ∈ N is arbitrary. Let X and Y be two n × n matrices. Then, √ [X, Y ]F ≤ 2XF Y F . (1.2) Well, you probably want to throw in the operator norm into the discussion. As many investigations   are done in this one, it seems like a fixed option. However, 01 0 −1 20 the example 10 , 10 = 0 −2 shows that we are stuck with the trivial estimate (1.1). Here, the constant can not be improved. The Frobenius norm is more selective. The commutator is someway strange. As a map turning two matrices into a third one, it is not injective. The kernel is non-zero. In particular, every matrix commutes with itself. For this reason, [X, Y ]F = [X, Y + αX]F .Inconse- √quence, we can replace the right-hand side of (1.2) by virtually any of the terms     −   2 2 X F Y +αX F .Sinceα is arbitrary, choosing α = X, Y / X F is possible. 3 This is indeed a minimizer of Y +αXF and would further reduce the (squared ) upper bound (if it really should be valid) by an inner product4:  2 ≤  2  2 − | |2 [X, Y ] F 2 X F Y F 2 X, Y . (1.3) Why should (1.2), or equivalently (1.3), be true? Here is a demonstration for n =2.     ab ef Proof. Put X = cd , Y = gh . An elementary calculation delivers  2  2 − | |2 − − 2 2 X F Y F 2 X, Y XY YX F (1.4) =2|ah − de|2 + |(a + d)f − (e + h)b|2 + |(a + d)g − (e + h)c|2 which clearly is a sum of non-negative terms. 

1Duetoalgebra,theLiebracketnotation[X, Y ] is a common abbreviation. 2. . . except for the case involving zero matrices, which is pretty lame, for obvious reasons.   3 2 1/2 This is always a good idea because of the definition AF := j,k |ajk| . . . simplifying transformations and avoiding an unnecessary flood of root symbols. 4 An advantage of the Frobenius over the operator norm: it is induced by the scalar product ∗ A, B := Tr(B A), where Tr C := i cii denotes the trace.

[email protected] Commutator Estimates 535

So easy the last lines emerged, it must be clarified that the same idea refuses to work for matrices of size 3 × 3 and beyond. It’s not that we are unable to show anything related; an analogous statement simply is not true. Also the original attempt from [6, Theorem 4.2] cannot be transferred into higher dimensions.5 Of course, there was something that made us believe into the validity of (1.2). During the initial hunt for “utterly non-commuting” matrix pairs, we made over-  2  2  2 view plots for the ratio [X, Y ] F / X F Y F and generated pictures like these:

   

         





 



     √ Undoubtedly, the values for n = 2 may reach the constant 2 (squared, of course), but won’t go further. Yet, it is another point that catches the eye. Apparently, random size 3 matrices have difficulties in producing big commutators. The clustering effect at very small values becomes even stronger when n increases. That situation left us torn between astonishment and annoyance. On the one side, the seemingly whole-range-filling 2 × 2 matrix case (the one where actually something happens) was tackled completely. But on the other, although large sized matrices evidently are far away from making any hassles, this wasn’t usable for obtaining a general proof. And even though n = 2 later turned out to be the most interesting case, there should remain lots of stuff to do!

2. Widening the scope At the start, we had a conjecture about a general norm bound, and could prove it only in some special cases (for size 2, or if one of the matrices is normal). Well, additionally, we were able to show that (1.1) must be too weak. Mastering deter- minants√ as he usually does, Albrecht could validate the estimate with 2 replaced by 3 . . . that’s half way on what we wished for, at least.

5The proof relies on the matrices’ trace being spread over only two entries.

[email protected] 536 Z. Lu and D. Wenzel

Of course, there were also the overwhelming observations that commutators typically avoid large values. So, we wanted to give this aspect a foundation. For this, we could revert to a previous work with Grudsky [5] on a bound for norms of random vectors under a linear map. Luckily, when interested in the Frobenius norm, we can look at a matrix as if it is a vector. It was shown in [6] that, taking  2  2  2 →∞ the expected value, the ratio [X, Y ] F / X F Y F tends to zero as n , and the convergence is linear. In conclusion, the norm of the commutator is small compared to the norms of the involved matrices:        − 2 1 ·  2  2 E XY YX F = O n E X F Y F , (2.1) under very weak assumptions on the underlying distribution.6 The efforts afterwards concentrated on the minimal bound. It stayed open for quite a while. Then, simultaneously, three really√ distinct proofs for (1.2) saw the light of the day ([7],[26],[34]). So, the constant 2 in the inequality is now shown, and it is known to be best-possible.7 Sure, with that, only the first step on an even longer road up to the present was done. 2.1. Illuminating the representation In 1900, Hilbert presented the famous list with 23 problems that should heavily in- fluence the following century of mathematics. About the half of them is completely solved by now, and still a handful of them is way too far from being understood. A particular problem, the 17th, prepares the ground for this part. It reads as follows. Consider a multivariate or rational function. If it takes only nonnegative values over R, can it be represented as a sum of squares of rational functions? It was confirmed “only” three decades later.8 In the formulation, Hilbert already has taken into account that there definitely exist nonnegative that are not a SOS9 of other polynomials. So, finding a polynomial SOS is always on the agenda whenever a polynomial in n variables is nonnegative. The commutator inequality (1.2) is equivalent to  2  2 − − 2 ≥ 2 X F Y F XY YX F 0. On the left-hand side, there obviously is a quartic polynomial. And we claimed that it is nonnegative. The investigated form was further reduced a little more to (1.4). And in the proof for n = 2, we indeed obtained a sum of polynomial squares (actually of quartics, again). It is even the complex version. Naturally, this gave hope also for larger matrices. L´aszl´o[24] proved that, as remarked, the reduced form is not SOS. Here is his result.

6It’s not even necessary that X and Y are chosen independently. 7A suitable example that gives equality can be found on the second page. 8Worked out by Artin 1927, it was just the second problem to be resolved that time! An algorith- mic solution was given by Delzell in 1984. The minimal number of squares is yet not known, but limited by 2#variables due to Pfister 1967. In other fields, the answer may be negative. However, a generalization to symmetric matrices with polynomial entries is known. In complex analysis, the Hermitian analogue considers squared norms. 9In mathematics, this is not a “help me” signal like in navigation. It stands for “sum of squares.”

[email protected] Commutator Estimates 537

Theorem 2.1. Suppose X, Y ∈ Rn×n. The smallest value γ so that   n  2  2 − 2 − − 2 (2 + γn) X F Y F X, Y XY YX F n−2 is a sum of squares of polynomials is γn = 2 . Since polynomials are of special interest, explicit sufficient conditions were developed for judging on the SOS-property.10 Studying related optimization prob- lems gives such a technique. The reduction to (1.4) has a benefit over regard- ing the form associated with the initial inequality (1.2). That only differences 11 xij ykl − yij xkl appear is used cleverly in proofs. In this manner, when restricted to special matrix classes, (1.4) admits a de- composition into SOS [25]. Certain prescribed structures may manage to eliminate disturbing variables. Examples of good nature are matrices having non-zero entries only in the first row and the last column. Also, if X and Y are taken from the cyclic(!) Hankel12 or tridiagonal matrices, the original form is SOS. At last, we want to restate a case that looks like a gift made with Albrecht in mind. Conjecture 2.2. The form (1.4) generated by two arbitrary real Toeplitz matrices is SOS. This is currently proven for sizes up to 7×7. With n = 8, there is a change in the properties that kills the proof’s idea. Nevertheless, other indicators (esp. in the corresponding semidefinite program) sustain the assertion. The matrix structure cancels many terms that could produce troubles.13 2.2. Exploring new measures There is more than one norm. Two of them were brought forward right from the start. One was interesting, as it apparently profits somehow of the commutator’s structural properties. The other behaved, well, just as trivial as one would expect. Of course, a whole bunch of other norms is still waiting out there. Without any doubt, these may vary in their sensitivity in regard of the comparison of the commutator with its input matrices. So, different metrics will yield different best constants in the inspection of

XY − YX ≤ CXY . Since any given norm may be scaled, and this factor would pop up once on the left, but twice on the right, C can be arbitrary, in principle. But, this is kind of cheating; an anchor is required for comparing the effects of different norms. For this reason, we put a natural normalization condition onto elementary matrices [35].

10Note that every real nonnegative polynomial can be approximated as closely as desired by SOS-polynomials, but we seek for finitely many summands. 11 2 2 2 2 One easily checks 2XF Y F − 2| X, Y | = X ⊗ Y − Y ⊗ XF ,where⊗ is the . This is actually the Lagrange identity. 12Alas, ordinary Hankel matrices will fail to produce SOS, in spite of the fact that they are symmetric. 13Looking at n = 2, e.g., the first summand in the representation (1.4) will vanish.

[email protected] 538 Z. Lu and D. Wenzel

⊗ ∗    ≤ Theorem 2.3. Let √Ejk = ej ek.Suppose Ejk =1and Ekj 1 for some j = k.Then,C ≥ 2.

What are good norms for matrices? We could generalize the Frobenius norm to vector p norms. More important in mathematical practice are the p induced operator norms.14 But, most important for matrices are norms that are invariant under unitary transformations. Via the singular value decomposition, such norms are in one-to-one correspondence with the norms that can be defined for the vector of singular values.15 Prominent examples are the Ky Fan norms16

·(m) = σ1 + ···+ σm (σ1 ≥···σm ≥···≥σn ≥ 0) and the Schatten norms · p ··· p 1/p p =(σ1 + + σn) .

Also a mixture ·(m),p of them (counting only the m largest singular values) has been handled. The Frobenius norm fits into the scheme, too. Because of the  2  2 2 · · alternative representation X F = USV F = i σi , one has F = 2. As time passed, another proof for (1.2) was discovered [2]. Going even further, the tighter inequality √ [X, Y ]2 ≤ 2X2Y (2),2 (2.2) has been shown to hold true. Also the Schatten norms are fully treated. As they somehow morph from Frobenius to operator norm (·∞), the resulting value Cp is also in between. For this, the Riesz–Thorin theorem17 got a fruitful application. Knowing the commutator inequalities for 2 and 1, as well as ∞, complex interpo- lation yields estimates for all indices in between.18 Luckily, these turn out to be sharp. One has to be careful and to check many assumptions before Riesz–Thorin will release its magic, but collecting all the pieces, together with sophisticated examinations of crucial base inequalities, even an extended problem was broadly solved. Motivated by the validity of (2.2), one should think about using up to three different norms with this type of inequalities. The constants, denoted by Cp,q,r, then are determined by the indices p, q and r of the norms of [X, Y ], X,andY .

Theorem 2.4. Suppose n ∈ N \{1}.Let1 ≤ p, q, r ≤∞be indices of Schatten norms in Cn×n, excluding the octant p>2, q<2 and r<2.Then,

14Simple exercise: C =2forallofthem. 15In addition, the so-called gauge function has to be invariant under permutations of the entries and under the transformations of any single sign. 16 Obviously, ·(1) is the classical operator norm (p = 2). So, it happens that C(m) = 2, again. 17Originally shown as an inequality for p norms of vectors, it is often used to obtain boundedness of certain operators on Lp or Schatten classes. 18This covers the complete norm family. Note that direct interpolation between 1 and ∞ will result in too large constants. Two separate ranges are required.

[email protected] Commutator Estimates 539

⎧ š › 1/p ⎪ 2 in ™ r ⎪ ⎪ 21−1/q in ⎪ š ⎨ 1−1/r 2 in › ™ — C = − − p,q,r ⎪ 21+1/p 1/q 1/r in ⎪ œ q ⎪ n 1/p−1/q−1/r ⎪ 2(2 ) in  ⎪ 2 ∞ ⎩ 1/p−1/q−1/r π ˜ 2n cos 2n in ž, √ 27 2 C∞,1,1 = 4 . – p 1 Thecube[1, ∞]3 of all possible norm indices (p, q, r) is divided into six con- nected parts. In each of these, the given value realizes the maximum over all six terms. The regions are the result from several equalities of two of them. The nature of the Riesz–Thorin theorem is convexity with respect to inverse norm indices or their duals (usually one writes 1/p =1−1/p). That’s the reason why the mapping p → 1 − 1/p transforms the infinite cube to the bounded [0, 1]3, while orders are preserved, and the value 2 is centered. Furthermore, terms as in the exponents of the six components will yield lines and planes as borders. This pictorial ansatz helped a lot in detailing the steps of the interpolation procedure. And so, a picture also spares us to give in formulas the six regions treated by [12] and [36]. There are three congruent bodies bounding the mostly unknown octant. The numbers ™, š, › are pictured next to the respective outermost point. On the opposite side, they border the fourth region œ. They all share the point (2, 2, 2) in the cube’s middle, which represents the original inequality (1.2). The last two parts are set on top. In contrast to the others, they depend on the dimension! 19 Moreover, a flaw between even and odd sizes becomes lucid. In the pyramid ,  n  − 2 2 evaluates to n or n 1, respectively. Additionally, if n is even, this region also occupies the last layer, so that ž does not exist. In the odd case, the index on ∞ ∞ n−1 π  8 the line (p, , ) marking the interface is given by pn =ln n / ln cos 2n π2 n, whence ž becomes thinner as n →∞. The method also works if one norm on the right-hand side is truncated to a 1/p 1−1/p mix norm [18]. Thanks to (2.2), again one has, e.g., Cp;p;(2),p =max{2 , 2 }. In the end, with the investigation of only a couple of inequalities (first of all, the Frobenius case in the middle), a crowd of others is delivered “for free.” Let us return to the case of utilizing only a single norms. With Theorem√ 2.3 in mind, the Frobenius norm is special. It realizes the minimal constant C = 2in such kind of estimates. So far, we had no luck to meet a comparable norm. In [7], we proved that if n = 2, then the Frobenius norm is the√ only unitarily invariant norm with this property. Non-invariant norms realizing 2 exist. The invariance further imposes strong restrictions: every planar two-coordinate cut through the gauge

19The Riesz–Thorin approach may be used naturally within the family of vector norms (regarded on matrices). For p<2, one gets the same constants Cp.However,forp>2, even when all norms coincide, Cp is already size dependant (unlike the Schatten counterpart).

[email protected] 540 Z. Lu and D. Wenzel

20 function’s unit ball must be the usual disc. That leaves ·(2),2 as a suspicious candidate. Though it remains unclear√ if this serves the purpose for n ≥ 3, by [17], its dual norm is an example with C = 2. The unit balls for the three-dimensional cases illustrate that the dual norm (right) is the largest possible norm21,andthat there is a huge gap to the still open ·(2),2 (left) and the norms in between.

     

   

             

2.3. Tackling the equality cases If one has an inequality like (1.2), a naturally appearing task is the determination of the matrices that actually result into “=”. The aim is to identify matrices that are far away from commutativity.

n×n Definition 2.5. Apair(X,√ Y ) of matrices C is said to be maximal if X = O, Y = O,and[X, Y ]F = 2XF Y F is satisfied. Similar definitions can be made for the inequalities subject to other norms. For the Schatten norms, we then consider (p, q, r)-maximal pairs. Early, we strengthened (1.2) to (1.3). With this observation, clearly, the additional term must be zero. So, fulfilling (1.2) as an equality will require matrices that are or- thogonal to each other in the Frobenius inner product.22 But, there is much more to come. The same can be done for any other matrix commuting with one of the two arguments. In particular, the identity matrix I yields vanishing traces. Powers Xm of one matrix are also in the respective commutant. Consequently, there are lots of restrictions for a pair to be maximal [7]. With size n = 2, the sufficiency is verified by a calculation as simple as (1.4). Theorem 2.6. Two matrices X, Y ∈ C2×2 form a maximal pair if and only if Tr X =TrY =0and X, Y  =0.

20In other words, if all but two singular values are zero, the norm of the vector has to equal its Frobenius norm. 21The mix norm directly follows the necessary cylinder restriction, whereas its dual norm pos- sesses the tightest ball, i.e., the convex hull of the fixed circles. 22Recall: X, Y  =TrY ∗X =0.

[email protected] Commutator Estimates 541

Where have all the power conditions gone? For n = 2, due to the theorem of Cayley–Hamilton23, X, Y m =0withm>1 is no new restriction. Moreover, taken together, the cases m ≤ 1 already yield a sufficient condition. For n ≥ 3, on the other side, it’s not enough; they do not suffice. In search for more restrictions, by (2.2), we have the chain of inequalities √ √ [X, Y ]F ≤ 2XF Y (2),2 ≤ 2XF Y F .

Hence, when having the equality sign in (1.2), Y (2),2 = Y F must hold, i.e., Y has no more than two non-trivial singular values, or equivalently, Rk Y ≤ 2. Some of the proofs of (1.2) even unveil that necessarily both, X and Y , need to have small rank at the same time. This gives a clue on why the following theorem taken from [13] is true. Theorem 2.7. Two matrices X, Y ∈ Cn×n form a maximal pair if and only if there ∗ ∗ exists a unitary matrix U such that X = U(X0 ⊕ O)U and Y = U(Y0 ⊕ O)U with a maximal pair (X0,Y0) of 2 × 2 matrices. Surprisingly, maximality is more requiring than hinted before; the large-sized matrices need to be simultaneously unitarily similar to matrices in C2×2, padded with zeros.24 Thus, they do not exceed rank two. Nevertheless, note that the inequality √ [X, Y ]2 ≤ 2X(2),2Y (2),2 suggested by this observation is not true. The best constant in the inequality is furthermore growing with the dimension. Maximal pairs in the general Schatten case can be determined, too. The premium of interpolation (see Theorem 2.4) is that equality in an intermediate relation will hold only if the base relation also attains the upper bound (with ap- propriate matrices given by the method). In particular, ranks are kept. In addition, the monotonicity ·p ≤·q for p ≥ q is utilized several times. Having equality there, compels down the rank to one.25 This applies for at least one of the matri- ces X, Y ,or[X, Y ]in™, š,and›. Moreover, multiples of unitarities growingly appear (in œ all three must be of this type). Zero traces are preserved, and the orthogonality X ⊥ Y from the central case 2 is tweaked via polar powers.26 Note that interfaces between the regions and constellations involving the index ∞ are less restrictive, because they are not amenable to the utilized laws (cf. [36]). Our little excursion depicting the major achievements in the direct surround- ing of the B¨ottcher–Wenzel commutator inequality has now come to an end.

23As the characteristic polynomial (here, det(tI − Y ) is of degree two) returns the zero matrix if applied to Y , indeed, Y 2 is linearly dependent of I and Y . 24 In the real case, X0 and Y0 canbechosenreal,andasforU, “unitary” should be replaced by “orthogonal.” 25When looking at vector norms, the number of non-zero entries takes over the role of the rank, just as it already was in the norm definitions. 26The matrix entries reiϕ become rP eiϕ with some fixed P .

[email protected] 542 Z. Lu and D. Wenzel

Instead of detailing the history of the results with a long text, the development will be outlined with help of a timeline. Figure 1 at the end of the paper marks special cases along the way. An adjoining path concerned with a geometric problem will be discussed in the following section. Thereafter, we will focus our attention on some special interpretations of the original problem, as well as ongoing investigations for further generalizations. Some additional results are also summarized in the recent survey by our companions of the Macau research [11].

3. Progressing into geometry The inequality was connected to geometry from the very beginning. To be honest, what else could you expect when working with a norm that comes from a scalar product? Actually, the proof of (1.2) from [7] is already based on a strengthening of Cauchy–Schwarz’ inequality under special restrictions. Several variants of the commutator estimate were found that have some geometric meaning. As shown in [7], if three copies are assembled (one for every coordinate), (1.2) yields an inequality involving cross products over a family of usual vectors v(jk): 9 9 9 92 9 9 n 9n 9 n 9 92 9 (ik) × (kj)9 ≤ 9 (ik) × (j)9 9 v v 9 v v . i,j=1 k=1 i,k,,j=1 This connection wasn’t too surprising, since the cross product and the commutator are close friends; they are the Lie products. An interpretation in differential geometry is also given. Suppose a differen- tiable curve g :(− , ) → Cn×n has all images invertible and is moreover nor- malized by g(0) = I.Ifs(t)=g(t)Ag(t)−1 is the induced map into the so-called similarity orbit of some A ∈ Cn×n, then the derivatives are determined by com- mutators. In this context, (1.2) becomes √   s (0)F ≤ 2 g (0)F . We have seen already that the commutator inequality implies even better results. The estimate 9 9 9 9 9 9 9 9  − 2 ≤     ·   −  XY YX F 9 Y F X + X F Y 9 9 Y F X X F Y 9 F F fits directly between (1.3) and (1.2). The two matrices “X + Y ”and“X − Y ” on the right-hand side (with X and Y adjusted in their norms) are orthogonal. When they are seen just like ordinary vectors, they span a rhomb. Twice its area is written on the right-hand side. Most notably, a geometric relative of (1.2) is crossing our ways. For quite a long time, most of us who were concerned with the commutator problem hadn’t known of this link. No real wonder, as it is originated in higher geometry, where it reads like what follows:

[email protected] Commutator Estimates 543

Let a manifold be isometrically immersed into a space of constant cur- vature c.27 Then, at every point, ⊥ ≤ 2  2 ρ + ρ H F + c = H + c. (3.1) Here, ρ and ρ⊥ denote the normalized scalar curvatures along the tangential and the normal bundle of the embedding, respectively. H is the mean curvature tensor. Many of our fellows are no experts in this field, so we want to take the chance for pointing out how the connection is established. For those who are not that familiar with these notions, first some basic insight. 3.1. About “curvature(s)” The curvature of a planar curve is quite well known: it is the signed reciprocal of the radius of the osculating circle. However, more dimensions provide much more free- dom. One of the major definitions for arbitrary manifolds is the Riemann curvature tensor.GivenbyR(u, v)w = ∇u∇vw −∇v∇uw −∇[u,v]w, this fourth-order tensor measures non-commutativity of the second covariant derivatives. It quantifies the extent to which the submanifold is not locally isometric to a Euclidean space, i.e., 4 how “non-flat” it is. When represented in a basis, the tensor is described by n l ik j numbers Rijk and can be reduced to the scalar curvature R = ik g j Rijk . The coefficients gik are derived from the metric tensor.28 The number R is a quan- titative characteristic for the deviation between the volumes of a geodesic ball in the manifold and the Euclidean unit ball. Now, for an n-dimensional manifold, the (normalized) scalar curvature 2 n ρ = R(e ,e )e ,e  n(n − 1) i j j i 1=i

(with {ei} being an orthonormal basis of the tangent space) is a plain multiple of R since, for an isometric immersion as under investigation, (gik)isjusttheidentity matrix.29 The scaling is simply done for better appearance. Similarly, the normal scalar curvature30 is declared by L M 2 M n m ρ⊥ = N R⊥(e ,e )ξ ,ξ 2 n(n − 1) i j r s 1=i

27Think of the Euclidean space Rd or the sphere Sd−1 as typical examples for ambient spaces of constant (sectional) curvature. 28 ik Actually, (g ) is the inverse of the Gram matrix (gik), which contains the coefficients of the first fundamental form. n n 29 j Clearly, we have 2 R(ei,ej )ej ,ei = Riji as diagonal entries are zero. 1=i

[email protected] 544 Z. Lu and D. Wenzel

The second fundamental form (or shape tensor) h(v, w)= n, ∇vw consists of orthogonal projections of covariant derivatives on the normal bundle. The tensor 1 ⊥ H = n Tr h is obtained by contraction. It is remarkable that ρ and H depend on the embedding, whereas ρ does not.32 So, there are properties that are inherent to the object itself, and others that change when another point of view is taken. We’ll take a look onto illustrating examples in order to better understand the claim. Note that ρ is undefined for n = 1. Nevertheless, (3.1) makes sense even in that case if n(n−1) is moved to the right-hand side. Ignoring ρ⊥ for a moment, the ≤ 1 relation then becomes R 0. As a curve is locally flat, indeed R = R111 =0.So, in the Riemann sense, a curve is not curved. Also note that, for a two-dimensional manifold, R turns out to be twice the Gaussian curvature. And analogously, if R = 0, the surface is developable, i.e., locally flat. Returningtocurves,obviouslyρ⊥ = 0; so neither ρ nor ρ⊥ is related to a curve’s ordinary curvature. But, the idea of incorporating the normal space should be met before with a space curve’s torsion – by looking at the complement, it is measuring how strong the plane is left. Also the behavior of the signs can be compared to their interplay. The simplest non-trivial cases are surfaces (n = 2) in the usual three-dimen- sional space. And with this example, we encounter the crowd of curvature nota- tions. As m = 1, still ρ⊥ = 0 and (3.1) shortens to ≤ 2 2 ρ H F + c = H + c. (3.2) Since ρ⊥ ≥ 0 for all m, the latter is, in fact, a weakening. Back to our special case, with c =0,weobtain K ≤ H2, (3.3) relating the common Gaussian and mean curvature of the surface. Hence, (3.1) is a comprehensive extension of an old, established statement.

3.2. Connecting to formulas How does the geometric inequality (3.1) become a statement similar to (1.2)? Well, as soon as they are represented in coordinates, tensors are manageable algebraic objects. Then long-winded calculations based on h, to which even R and R⊥ can be related, result into the desired norm inequality for matrices. For granting a brief look, think about the graph of a differentiable param- eterized surface (x, y, f(x, y)). Then, h actually is a quadratic form and can be fxx fxy 33 associated with the Hessian matrix fxy fyy . Now, if one has an n-dimensional manifold of codimension m,therearem such matrices of size n × n.Notethat the matrices will be symmetric by the natural assumption of interchangeability of second partial derivatives, whence their traces (the components of H) are real.

32 One says the quantities are extrinsic or intrinsic, respectively.  33 2 2 LM Usually, it is written h = Ldu +2Mdudv + Ndv ∼ MN . In this explicit situation, L, M and N can be calculated with a Taylor expansion. A factor containing first derivatives does appear, too, but is equal for all three coefficients.

[email protected] Commutator Estimates 545

And how does the commutator come into play? When inspecting the defi- nition of the curvature tensor, we can formally see commutators already. And it happens that they indeed correspond to commutators of matrices built from the tensor components. In the end, (3.1) admits the following algebraic reformulation. We are going to discuss later on in which ways this inequality closely interacts with the similarly looking (1.2).

Theorem 3.1 (Normal scalar curvature conjecture). Let A1,...,Am be real sym- metric n × n matrices. Then,    m 2  2 ≤  2 2 [Ar,As] F Ar F . (3.4) r