<<

Cent. Eur. J. Math. • 9(2) • 2011 • 205-243 DOI: 10.2478/s11533-011-0003-5

Central European Journal of

Fredholm

Invited Review Article

Henry P. McKean1

1 CIMS, 251 Mercer Street, New York, NY 10012 USA

Received 15 November 2010; accepted 16 January 2011

Abstract: The article provides with a down to earth exposition of the with applications to Brownian motion and KdV equation.

MSC: 46E20, 35Q53

Keywords: Fredholm detemninant • Brownian motion • KdV equation © Versita Sp. z o.o.

Preface

The work of I. Fredholm [10] is one of the loveliest chapters in mathematics, coming just at the turn of the century, when serious attention was first given to analysis in infinite-dimensional () . Nowadays, it is explained mostly in an abstract and, if I may say so, a superficial way. So I thought it would be nice to do it from scratch, concretely, following Fredholm in using only the simplest methods and adding a few illustrations of what it is good for. The treatment has much in common with Peter Lax [16] which you may wish to look at, too; see also Courant–Hilbert [6] and/or Whittaker–Watson [25], and, of course Fredholm’s 1903 paper, itself. Cramer’s rule expresses the inverse of a square d by ratios of sub-determinants, as I trust you know. There, the is finite. What Fredholm did was to adapt this rule to infinite , i.e. to function space. Poincaré had posed the problem, giving as his opinion that it would not be solved in his lifetime. Fredholm did it just two years later. Chapter 1 is to remind you about determinants and Cramer’s rule in d < ∞ dimensions, first in the conventional form and then in Fredholm’s form which is specially adapted to the passage from d = 1; 2; 3;::: to d = ∞. Obviously, if this is to work, you want only matrices which are close enough to the identity, i.e. something like I + K where I is the identity, K is “small”, and you need to express det (I + K) as 1 + [terms in K;K 2;::: ] under effective control. This is a matter of bookkeeping. Look at a I + λK with a variable λ, develop its in powers of λ (it is a after all), and put λ back equal to 1. This produces the formula   " # Kii Kij Kik X X Kii Kij X   det (I + K) = 1 + Kij + det + det Kji Kjj Kjk + ::: + det K; Kji Kjj   1≤i≤d 1≤i

205 Fredholm determinants

which is the key to the passage to d = ∞ carried out in Chapter 2. Think now of K as an integral operator

Z1 K : f 7→ K(x; y)f(y) dy; 0

acting on C[0; 1] say, and approximate its action by that of the matrix

  i j   K ; 1 n n n 1≤i;j≤d

acting in dimension d. Then the preceding formula for the determinant of the matrix I + K makes plain the good sense of Fredholm’s recipe for the determinant of the operator I + K:

" # Z Z K x ; x K x ; x I K K x; x dz ( 1 1) ( 1 2) d2x det ( + ) = 1 + ( ) + det K x ; x K x ; x ( 2 1) ( 2 2) ≤x≤ ≤x

Chapter 2 develops this idea along with Cramer’s rule, in detail. (Don’t worry. I’ll make a better notation when the time comes.) Particular attention is paid to the special but important case of symmetric kernels: K(x; y) = K(y; x); of these I give only the simplest examples. Chapter 3 presents three serious examples of what you can do with this machinery. The first application is to Gaussian probabilities with emphasis on Brownian motion, as initiated by Cameron and Martin [3]. The second is to the Kortweg– de Vries equation (KdV) describing long waves in shallow water. Here, I follow mostly Ch. Pöppe’s [21] explanation of a formula of Dyson [8], representing the solution of KdV by means of a special type of Fredholm determinant. The third application is to the N-dimensional unitary ensemble, so-called, i.e. the of N×N unitary matrices equipped with its invariant volume element. What is in question is the distribution of eigenvalues of a typical matrix both for N < ∞ and for N = ∞. Here, the principal names are Dyson [7], Jimbo et al. [12], Mehta [19], and Tracy–Widom [22]. I cannot close this preface without a little joke. I gave a lecture in Paris a while back: something about “Fred Holm’s determinants” the advertisement said. Oh well! The name is Ivar. Don’t forget it.

Prerequisites

Not much in fact. You will need: 1) standard , both real and complex; 2) calculus in several variables, of course; 3) some acquaintance with Lebesgue measure (on the unit only) and the associated space L2[0; 1] of (real) R 1 f2 < ∞ measurable functions with 0 ; 4) elementary complex function-theory. I will explain anything non-standard or more advanced. Good references are: for 1), Lax [15]; for 2), Courant [5]; for 3), Munroe [20]; for 4), Ahlfors [1].

1. An old story with a new twist

d I collect here what you need to know about linear maps of the finite-dimensional space R to itself, first in the conventional format and then in Fredholm’s way.

206 H.P. McKean

1.1. Simplest facts

d d  d L : R → R is the map, R = y : y = Lx for some x ∈ R is its range, N = {x : Lx = 0} its null space, and similarly † † † for the transposed map L : range R and null space N . The figure displays these notions in a convenient way.

† o o † The position of N to the right indicates that it is the annihilator R of R: Indeed, y ∈ R if and only if (Lx; y) = (x; L y) † † o † vanishes for every x, which is to say L y = 0, and similarly to the left: N = (R ) . L kills N and so maps R one- † † † † to-one onto R; likewise, L kills N and maps R one-to-one onto R . It follows that dim R = dim R ≡ r and so also †   N N ≡ n d−r L x → y M mij dim = dim = . The map : may be expressed by a matrix = 1≤i;j≤d relative to the standard frame: e ::: ; e ::: ; : : : ; e ::: 1 = (100 0) 2 = (010 0) d = (000 1) as in d X yi = mij xj ; i = 1; : : : ; d: j=1

† † R is the span of the columns of M, R the span of its rows, and dim R = dim R expresses the equality of “row ” and “column rank”.

1.2. Determinant and Cramer’s rule

d d It is desirable to have a practical way to tell if L : R → R is one-to-one and onto (and so invertible), and also a recipe − for its inverse L 1 in this case. Introduce (out of the air) the determinant:

d X Y M π m ; det = sign iπ(i) π∈Sd i=1 the sum being taken over the “symmetric group” Sd of all permutations π of the “letters” 1; 2; : : : ; d. Then L is one-to-one −1 −1  −1 M 6 L M mij and onto if and only if det = 0, in which case is expressed by the inverse matrix = 1≤i;j≤d formed according to Cramer’s rule: 1) strike out the ith row and jth column of M and take the (d − 1) × (d − 1) determinant of what’s left, this being the ij i j “cofactor” of M, so-called; 2) supply the signature (−1) + ; 3) ; 4) divide by det M. More briefly,

−1 −1 i+j  mij = (det M) (−1) × the ji cofactor of M :

This is, I trust, familiar. If not, don’t worry. I will derive it all in a geometrically appealing way in the next three sections.

207 Fredholm determinants

1.3. Signed volume

The determinant, which seems at first so complicated and unmotivated, has a simple geometrical meaning: It is the V f Le ; : : : ; f Le volume of the parallelepiped spanned by the vectors 1 = 1 d = d, with a suitable signature attached. L R d F f ; : : : ; f Plainly, is one-to-one and onto if and only if dim = , which is to say that the family = [ 1 d] is a “frame”, d i.e. that it spans R , and this in turn is the same as to say that the associated (signed) volume V = V (F) does not vanish. I discuss this signed volume one dimension at a time. d 1: e F f 6 V F f |f | = 1 = +1 is the standard frame; any other frame is just a number 1 = 0, and ( ) = 1 is the length 1 f f f with the signature of 1 attached. The frame 1 is “proper” or “improper” according as 1 points in the same direction as e f 1 = +1 or not. Obviously, you cannot deform a proper into an improper frame without passing through 1 = 0 which is not a frame at all, i.e. the proper and improper frames are disconnected. This is all simple stuff made complicated. The next step conveys the idea more plainly. d = 2: Here is the standard frame.

F f ; f f e A second frame = [ 1 2] can be lined up with it by a (counter-clockwise) rotation aligning 1 with 1 as in the next figure.

f The resulting frame is “proper” or “improper” according as the rotated 2 lies in the upper or the lower half plane. It e f cannot lie on the line directed by 1 as that would make it depend upon 1; in particular, the proper and improper frames are disconnected, one from the other, as in d = 1. The signed volume V is now declared to be the area of the parallelogram spanned by F, taken with the signature plus or minus according as the frame is proper or not, i.e.

V |f | |f | × θ; = 1 2 sin

208 H.P. McKean

so that the plus sign prevails for 0 < θ < π and the minus sign for π < θ < 2π. Let’s spell it out: The absolute volume is

|V | |f | × f f |f | × |f − e f ; e | = 1 the coprojection of 2 on 1 = 1 2 1( 2 1) p |f | × |f |2 − |f |−2 f ; f 2 e f |f | = 1 2 1 ( 2 1) 1 being 1 divided by its length 1 p q |f |2 |f |2 − f ; f 2 f2 f2 f2 f2 − f f f f 2 = 1 2 ( 2 1) = ( 11 + 12)( 21 + 22) ( 21 11 + 22 12) p f f − f f 2 ± f f − f f : = ( 11 22 12 21) = ( 11 22 12 21) d = 3: Here is the standard frame again, obeying the right-hand screw rule.

F f ; f ; f The absolute volume for the frame = [ 1 2 3] is

|V | |f | × f f × f f f ; = 1 coprojection of 2 on 1 coprojection of 3 on the plane of 1 & 2 but what signature should be attributed to V itself? To decide, consider the following four steps: 1) rotate F so as to f e e f align 1 with 1 and adjust its length so they match; 2) rotate about 1 to bring 2 to the “upper” half of the plane of e e e f e e 1 and 2 and deform it to 2 plain; 3) 3 now lies either above or below the plane of 1 & 2 and can be deformed ±e F e ; e ; ±e this to 3: in short, can be deformed (via honest frames) to one of the two frames [ 1 2 3], and it is signature, e V 3 attached to 3, that is ascribed to . In a fancier language, the frames of R fall into two connected deformation classes, “proper” and “improper”, distinguished by the above signature, much as for d ≤ 2, which is to say that R3 is capable of two different “orientations” exemplified by the right- and left-hand screw rules. n d ≥ 4: It is much the same in higher dimensions. A frame determines an increasing family of subspaces R = d span {fk : 1 ≤ k ≤ n}, ending with R itself; the absolute volume is the product, from n = 1 up to d, of the length n− of the coprojection of fn on R 1; and the signature of V is determined by the deformation class, of which there are two, e ; e ;:::; ±e this e as before: Any frame may be deformed to [ 1 2 d], and it is signature, attached to d, that is ascribed to V . The definition of the signed volume is now complete, but what has it to do with the determinant? I postpone the punchline in favor of two simple observations.

Observation 1.

V is “skew” in a sense I will now explain. Let π ∈ Sd be a permutation of the letters 1; : : : ; d. Then V (πF) = V f ; f ;:::  V F V πF π(1) π(2) represents the same absolute volume as ( ): it is only the signature that changes, as in ( ) = χ π V F χ π ± χ S χ π π χ π χ π ( ) ( ) with ( ) = 1. Plainly, this is a character of d meaning that ( 1 2) = ( 1) ( 2), and it is a fact that Sd has only one such character besides the trivial character χ ≡ 1: Obviously, every permutation is a product of “transpositions”, exchanging two letters, fixing the rest. Now you will check that if χ ascribes the value +1 to any transposition, then it is the same for every transposition, in which case χ is trivial, i.e. χ ≡ 1. Otherwise, it ascribes the

209 Fredholm determinants

value −1 to every transposition and χ(π) = (−1)#, # being the number of transpositions in π. This is nothing but the signature of π appearing in the formula for the determinant. The meaning of “V skew” is now this: V (πF) = χ(π)×V (F).

Aside. Here you have a somewhat indirect proof that χ(π) = (−1)#, alias the signature of π, is well-defined, i.e. that the parity of # is always the same, however π may be decomposed into transpositions. Another way is to examine how π Q x − x d x < x < ··· < x changes the signature of the “discriminant” i>j ( i j ) of distinct real numbers 1 2 d. Try it.

Observation 2.

V F f ; : : : ; f V is now extended to any family of vectors = [ 1 d], independent or not, by putting = 0 in the second case, V f ; : : : ; f V when the parallelepiped collapses. Then (most important) is linear in each of its arguments 1 d. Because f d V f f − f f is skew, you have to check this only for d. That is obvious for = 2 from the explicit formula = 11 22 12 21 and extends easily to d ≥ 3: in computing V , the part of fd in the span of fn, 1 ≤ n < d, changes nothing; it is only the f g V F V f ; : : : ; f ; g × f • g projection of d upon the normal to that span which counts, as in ( ) = [ 1 d−1 ] ( d ). This makes the linearity obvious.

1.4. Why V is the determinant

It remains to compute V from the two properties, 1) skewness and 2) the linearity elicited in Section 1.3, plus the d V e ; : : : ; e f P f e ≤ i ≤ d normalization 3) [ 1 d] = 1. Write i = ij j for 1 . Then j=1

d d X X V ::: f j : : : fdj V e j ; : : : ; e j : = 1 1 d [ 1 1 1 d ] by 2) j j 1=1 d=1

j ; : : : ; j V Here, it is only summands with distinct 1 d that survive, being skew by 1), so you may write

d d X Y X Y V V e ; : : : ; e f π f = [ π(1) π(d)] iπ(i) = sign iπ(i) by 1) and 3) π∈Sd i=1 π∈Sd i=1   fij ; = det 1≤i;j≤d

f Le etc. M M† and if now 1 = 1, , so that the preceding matrix is the old transposed, then it is seen that det is nothing but † the signed volume of the image of the unit cube C under L. Actually, det M = det M, as in rule 1 of the next section, so with an imprecise notation, det M = V (LC).

Aside. Besides its interpretation as a signed volume, det M has also a spectral meaning: it is the value of the characteristic polynomial det (M − λI) at λ = 0; as such, it is the product of the (generally complex) eigenvalues of M. These are either real or else they appear in conjugate pairs, M being real, in accord with the reality of the determinant.

1.5. Simplest rules

Here, A; B; C, etc. are d × d matrices.

Rule 1.

† det C = det C: indeed, d d d X Y X Y X −  Y χ π c χ π c − χ π 1 c ( ) iπ(i) = ( ) jπ 1(j) = jπ(j) i=1 j=1 j=1

− − since π 1 runs once through the symmetric group as π does so, and χ(π) = χ(π 1).

210 H.P. McKean

Rule 2. det (AB) = det A × det B = det (BA). This is also pretty clear. The | det C| represents the factor by d which the volume of any geometrical figure in R is changed by application of the associated , as you will see by dividing space up into little cubes, so the formula is correct, up to a signature to the right in case none of the e ; : : : ; e determinants vanish. But then each of the associated frames can be deformed without change of signature to [ 1 d] e ;:::; −e A B or to [ 1 d], and you can check the rest by hand: for example, if is proper (orientation-preserving) and is improper (orientation-reversing), then AB is improper and the signs work out as follows (−1) = (+1) × (−1).

Rule 3.

−  − det C 1 = (det C) 1 if det C =6 0, as is immediate from Rule 2.

Rule 4.

d det C does not depend on the of R . This is important: It means that the determinant is an attribute of the d d associated geometrical transformation R → R and not an artifact of its presentation by the matrix C. The point is − − that under C is replaced by D 1CD with det D 6= 0 and det (D 1CD) = det C by Rules 2 and 3.

Rule 5. ln det C = sp ln C if C is near the identity. Here sp is the trace. Now it is better to write C = I + K with small K so that ln C = ln (I + K) = K − K 2/2 + K 3/3 − ::: , by definition. The identity is trivial at K = 0 (C = I). Let ∆K be a tiny variation of K. Then, with a self-evident notation,

  ∆ ln det (I + K) = ln det (I + K + ∆K)/ det (I + K) −  = ln det I + (I + K) 1∆K by Rules 2 and 3  −  ' ln 1 + sp (I + K) 1∆K why ? −  ' sp (I + K) 1∆K = sp I − K + K 2 − K 3 + ::: ∆K   1 1  = sp ∆K − (K∆K + ∆KK) + K 2∆K + K∆KK + ∆KK 2 − ::: 2 3 = sp [∆ ln (I + K)] = ∆ sp ln (I + K):

The identity sp (AB) = sp (BA) is used in line 5. I leave the rest to you.

1.6. Cramer’s rule

The determinant of M being linear in each of its columns,

d ( ) d ∂ M X Y π X 0 Y det χ π m × 1 if (1) = 1 χ π m 0 ∂m = ( ) kπ(k) π 6 = ( ) kπ (k) 11 π∈S k 0 if (1) = 1 π0∈S k d =2 d−1 =2

0 where π permutes 2; : : : ; d leaving 1 fixed. This is nothing but the 11 cofactor of M. To evaluate ∂ det M/∂mij , put 0 0 i j M m mij the th row/ th column in first place, forming a new matrix with 11 = . This can be done by permutations of i j m m rows and columns requiring + transpositions, counting modulo 2: for example, 23 is brought to the place of 11 by transposing a) columns 3 and 2, b) columns 1 and 2, and c) rows 1 and 2, requiring 5 = (2 + 3) transpositions. It follows 0 i j that det M = (−1) + det M, so the rule is

∂ det M i j = (−1) + × [the ij cofactor of M]: ∂mij

211 Fredholm determinants

Now look at d X ∂ det M × mkj ; ∂mki k=1 in which you see ∂ det M/∂m transposed. This represents the signed volume associated to M with its ith column replaced by its jth column; as such, it is just det M if i = j, as you will see by Observation 2 of Section 1.3, but the parallelepiped collapses if i =6 j and you get 0, i.e., with an abbreviated notation,

 ∂ M † det M M × : ∂m = det [the identity]

− − † That is Cramer’s rule, expressing M 1 as (det M) 1 × (∂ det M/∂m) .

1.7. Fredholm’s idea

To extend this machinery from d = 1; 2; 3, etc. to d = ∞ dimensions, it is clear that M cannot be too far from the identity, i.e. you must have M = I + K with K “small”. For example, the ∞ × ∞ determinant   1 + 1 0 0 0 :::  / ::: ∞  0 1 + 1 2 0 0      Y 1 det  0 0 1 + 1/3 0 ::: = 1 + = +∞   n  0 0 0 1 + 1/4 ::: n=1 :::::::::::::::

makes no sense, but   1 1/2 0 0 0 :::  / / ::: 1 2 1 1 4 0 0    det  0 1/4 1 1/8 0 :::    0 0 1/8 1 1/16 ::: ::::::::::::::::::

should be OK: after all the determinant as volume cannot be more than the product of the lengths of the individual columns, so the square of the determinant should be less than           ∞ ! 1 1 1 1 1 5 5 5 5 X −n 5 5 1 + 1 + + 1 + + · ::: = 1 + 1 + · : : : < exp 5 4 = exp : 4 4 16 16 64 4 16 64 4 n=2 4 12

The task is now to re-express det (I + K) in such a way that the smallness of K can be exploited when d ↑ ∞. Following Fredholm, this is done by introducing a parameter λ, for book-keeping, as in ∆(λ) = det (I + λK), expanding in powers of λ: λ λ λ2 ··· λd ; ∆( ) = ∆0 + ∆1 + ∆2 + + ∆d λ K and putting back equal 1. Here, ∆0 = ∆(0) = 1, ∆d = det as you will check, and the rest is not hard to come by: ≤ p ≤ d ≤ i < i < : : : < i ≤ d i i ; : : : ; i |i| p e I For 1 and 1 1 2 p , write = ( 1 p), and = . Then, with [ ij ] = , the identity matrix,

d d d X Y X X X Y X X I λK χ π e λK  λp χ π K λp K ; det ( + ) = ( ) iπ(i) + iπ(i) = ( ) iπ(i) = det [ ij ]i;j∈i π i=1 p=1 |i|=p π fixing i i∈i p=1 |i|=p

i.e.   " # Kii Kij Kik X X Kii Kij X 2 3   d det (I + λK) = 1 + λ Kii + λ det + λ det Kji Kjj Kjk + ::: + λ det K Kji Kjj   1≤i≤d 1≤i

212 H.P. McKean

Cramer’s rule is to be recast in this format with a view to inverting I + λK if ∆(λ) =6 0. Taking λ into account, the recipe is  ∂ † I λK −1 −1 × 1 × ∆ ; ( + ) = ∆ λ ∂K but I want to spell it out. i un p ≤ i ; : : : ; i ≤ d K K Let be any ordered collection of distinct indices 1 1 p . [ ij ]1≤i;j≤p is abbreviated as ij. Then

X 1 K ∆p = p det ij ! |i|=p i=j with a self-evident notation. Here, the extension of the sum to unordered indices merely produces p! copies of each /p ∂ /∂K sub-determinant in ∆p, whence the factor 1 ! . Now look at ∆p+1 ij on diagonal: with further self-evident notations, ∂ h i ∆p+1 1 = × the sum over i = j 3 i with |i| = p + 1 of the ii cofactor of Kij ∂Kii (p + 1)! 1   X = × the p + 1 ways of placing i in i × det Ki0j0 p 0 0 ( + 1)! i =j 63i 0 |i |=p 1 0 0 = × [the unrestricted sum over i − the restricted sum with i 3 i] p! " # " # 1  0 X Kii Kij 1 X Kii Kij = ∆p − × the p ways of placing i in i × det = ∆p − det ; p Kii Kij p − Kii Kij ! i=j63i ( 1)! i=j3i |i|=p−1 |i|=p−1 from which the restriction i 63 i can be removed since the determinant vanishes if i 3 i. The off-diagonal is much the same: with i < j for definiteness,

 ∂ † ∆p+1 1  i j    = × the sum over i 3 both i and j of (−1) + × the ji cofactor of Kij ∂Kij (p + 1)! X 1 − i+j × = p ( 1) [the determinant of the matrix displayed below] ( + 1)! i=j3i;j

in which you keep ij(•) and remove ji(◦) together with its attendant row and column. Let’s move ij to the upper left-hand corner, keeping the indices k 6= i or j in their original order. This takes i − 1 transpositions for i and j − 2 for j, the intervening ith column having been crossed out, for a total of i − 1 + j − 2 ≡ i + j + 1 (mod 2) changes of sign. Taking i j the factor (−1) + into account leaves just one change of sign, net, and if now you also reflect that there are (p + 1) × p ways of placing i and j in i, you will obtain " #  ∂ † ∆p+1 1 X Kij Kij = − det ; ∂Kij p − Kij Kij ( 1)! i=j63i;j |i|=p−1

213 Fredholm determinants

just as on diagonal, only now the term ∆p is missing. I leave it to you to bring all this into the form   " #  ∂ † d−1 1 ∆ X p X Kij Kij  = ∆ − the matrix λ det : λ ∂K  Kij Kij  p i=j =0 |i| p = 1≤i;j≤d

It is instructive to see what these objects are when expressed more compactly. You know that

 ∂ † I λK λ−1 ∆ ( + ) ∂K = ∆

always, by Cramer’s rule, and also that ∞ − X p (I + λK) 1 = (−λK) p=0 if λ is small, so

∞  †  † ∞ ∞ ∞ X ∂ p ∂ X X X X λp ∆ +1 1 ∆ × I λK −1 λp × −λK q λn −K q; ∂K = λ ∂K = ∆ ( + ) = ∆p ( ) = ∆p( ) p=0 p=0 q=0 n=0 p+q=n

i.e.  † p ∂ p X ∆ +1 −K q: ∂K = ∆p−q( ) q=0 It is this kind of book-keeping that is the key to Fredholm’s success for d = ∞, as you will see in the next chapter.

2. Fredholm in function space

2.1. Where to start

I take the simplest case presenting the least technical difficulties; variants will readily suggest themselves and will be d used in Chapter 3 without further comment. I propose to replace R by the ∞-dimensional space C[0; 1] and K by the integral operator Z1 K : f 7→ K(x; y) f(y) dy 0 with K(x; y), 0 ≤ x; y ≤ 1, of class C[0; 1]2. Here and below, I do not distinguish between the operator and the kernel by means of which it is expressed: K means what it has to mean, by the context.

2.2. Aside on compactness

It is always some kind of compactness that makes things manageable for d = ∞; in effect, it reduces d = ∞ to finite d. Two kinds of compactness are needed here. The proofs are relegated to an appendix.

The space C[0; 1].

This is the space of (real) continuous functions f on the unit interval, provided with the kfk∞ = {|f x | ≤ x ≤ } F ⊂ C d f ; f kf − f k max ( ) : 0 1 . A figure is compact relative to the natural distance ( 1 2) = 1 2 ∞ if it is 1) closed, 2) bounded, and 3) |f(b) − f(a)| ≤ ω(|b − a|) for every f ∈ F, with one and the same, positive, increasing function ω(h), h > 0, subject to ω(0+) = 0. Below, fn → f∞ means this type of convergence.

214 H.P. McKean

2 The space L [0; 1]. R 1 f2 dx < ∞ kfk This is the space of (real) Lebesgue-measurable functions subject to 0 , with the Pythagorean norm 2 = qR R 1 f2 dx f ; f 1 f f dx B {kfk ≤ } 0 and the associated inner product ( 1 2) = 0 1 2 . Here, the unit ball = 2 1 is not compact d f ; f kf − f k B e x nπx relative to the distance ( 1 2) = 1 2 2: in fact, contains the mutually perpendicular functions n( ) = sin , n ≥ 1, and d(ei; ej ) = 1 for any i 6= j. Fortunately, B is weakly compact and this will be enough. It means that if e ∈ B n ≥ n < n < n < e ∈ B e ; f n for 1, then you can pick indices 1 2 3 etc. and a function ∞ so as to make ( n ) tend to e ; f n n ; n ;::: ↑ ∞ f ∈ L2 e e this ( ∞ ) as = 1 2 , for every simultaneously. Below, n ∞ means type of convergence. Take it as an exercise to check that sin nπx 0.

Example.

Obviously, K maps L2 into C. Let’s prove that F = KB is compact in C by verifying 1), 2), 3) in reverse order: 3), 2), 1). 3) For any function f = Ke in KB,

1 s Z Z 1 |f b − f a | ≤ |K b; y − K a; y | |e y | dy ≤ kek |K b; y − K a; y |2 dy ≤ ω |b − a| ( ) ( ) ( ) ( ) ( ) 2 ( ) ( ) ( ) 0 0 with  ω(h) = max |K(b; y) − K(a; y)| : 0 ≤ a; b ≤ 1; |b − a| ≤ h; 0 ≤ y ≤ 1 :

3) follows from the fact that ω(0+) = 0. Why so? 2) More simply, s Z 1  |f(x)| ≤ |K(x; y)|2 dy ≤ max |K(x; y)| : 0 ≤ x; y ≤ 1 ≡ m < ∞; 0 so KB is bounded. 1) KB is closed: in fact, if KB 3 fn = Ken → f∞ and if you assume, as you may, that en e∞, then you will have f∞ = Ke∞, as required.

2.3. Some geometry

† † † Now to business, recapitulating Section 1.1. Let R/R be the range and N/N the null space of I+K/I+K , respectively, in the space C[0; 1].

Item 1.

† N, and likewise N , is finite-dimensional; in particular, the associated (L2) projections map L2 into C.

Proof. Pick ek , 1 ≤ k ≤ n, from N, each of unit length:

Z1 kek k2 e2 2 = k = 1 0 and mutually perpendicular: Z1 (ei; ej ) = eiej = 0; i 6= j: 0

215 Fredholm determinants

Then Kek = −ek , so

n n Z1 X X 2 |ek x |2 K x; · ek ≤ K x; y 2 dy; ( ) = the projection of ( ) upon 2 ( ( )) k k =1 =1 0

this being an instance of Bessel’s inequality, expressive of the Pythagorean rule. Now integrate to obtain

n 1 1 1 X Z Z Z n = |ek |2 dx ≤ |K(x; y)|2 dxdy: k =1 0 0 0

Item 2.

† o † o † N = R and N = (R ) as in the picture, just as for d < ∞; in particular, (I + K): R → R is one-to-one and onto.

Item 3.

† † R, and so also R , is closed as it stands; in particular, C is the perpendicular sum of closed subspaces R ⊕ N or † R ⊕ N .

R 3 f I K e → f ke k ↑ ∞ Proof. Let n = ( + ) n ∞. If n 2 , then

0 fn 0 0 en fn = = (I + K) en with en = ; kenk2 kenk2

0 0 0 0 0 0 0 and you may suppose that en e∞. Now fn → 0, so en = fn − Ken → −Ke∞ on the one hand and, on the other 0 0 hand, to e∞ itself, the upshot being that e∞, which cannot vanish as it is of unit length, belongs to N. But you may e ∈ No e0 ∈ No ke k < ∞ also suppose, from the start, that n , whence ∞ too, and this is contradictory. If now lim inf n 2 you may suppose en e∞. Then Ken → Ke∞ and fn → f∞ force en → e∞ and so also f∞ = (I + K) e∞, as wanted.

† This was all plain sailing with the help of compactness. What is not clear is whether dim N = dim N is still true. Now † † you can’t just count dimensions as for d < ∞ when you could see at once that dim R = dim R . Both dim R and dim R are infinite and it is not obvious what to do.

Item 4.

† † n = dim N and n = dim N are one and the same.

† † † Proof. For conversation’s sake let n be larger than n, let ek , 1 ≤ k ≤ n, and ek , 1 ≤ k ≤ n , be unit mutually † perpendicular functions spanning N and N respectively, and replace K by

n 0 X  † †  K = K + ek ⊗ ek ≡ ek (x)en(y) : k=1

216 H.P. McKean

0 0  † Then I + K has no null space: It maps C one-to-one onto R = R ⊕ span en : 1 ≤ k ≤ n , permitting a reduction to † † † † † the simplified case: n = 0 < n . Pick e from N and discretize (I + K) e = 0 so:

d  i  X  j i   j   i  e† K ; 1 e† ≡ f ≤ i ≤ d d + d d d d d for 1 j=1 with d   2 X † k e 1 d d = 1 k=1 as you may do and   i f o d ↑ ∞: maxi d = (1) as

This can be rewritten: d  i  X   j i   i   j   j  e† K ; − f e† 1 e† : d + d d d d d d = 0 j=1

Now for d < ∞, it is guaranteed that the transposed problem also has a solution, e:

d  i  X   i j   i   j   j  e K ; − e† f 1 e d + d d d d d d = 0 j=1 with d   X k 2 e 1 d d = 1 k=1 if you want, and the idea is to produce a (contradictory) null function e ∈ N by making d ↑ ∞ here. This is easy:

d     2 d   X j j X j 2 f e 1 ≤ f 1 : d d d d d j=1 j=1 is negligible, so d  i  X  i j   j  e K ; e 1 o d + d d d d = (1) j=1 in which the error o(1) is small independently of 1 ≤ i ≤ d. Let fd(x) be the function with values e(i/d) for (i − 1)/d ≤ x < i/d f f kf k , and suppose, as you may, that d ∞, d 2 being unity by display no. 3 above. Then

Z1 fd(x) + K(x; y) fd(y) dy = 0 0 up to a like error, small independently of 0 ≤ x ≤ 1, and if you now make d ↑ ∞, you will find 1) Kfd → Kf∞, 2) fd → f∞ kf k I K f f ∈ C n > with ∞ 2 = 1, and 3) ( + ) ∞ = 0, with the implication ∞ , i.e. 0 which is contradictory.

217 Fredholm determinants

2.4. Fredholm’s determinant

To make a determinant for I +K in the ∞-dimensional space C[0; 1], you do the obvious thing: Replace the function f(x), ≤ x ≤ d f k/d ≤ k ≤ d K d × d K i/d; j/d /d 0 1, by the -dimensional vector ( ), 1 , and the operator by the matrix ( ) 1≤i;j≤d so that d 1 X  i j   j  Z K ; 1 f K x; y f y dy; d d d d approximates ( ) ( ) j =1 0 I K i/d; j/d /d write out Fredholm’s series for the determinant of + ( ) 1≤i;j≤d as in

d     X 1 X i j 1 1 + det K ; ; p d d d i;j∈k p=1 ! |k|=p

and make d ↑ ∞ to produce

∞ Z1 Z1 X 1   p I K ::: K xi; xj d x: det ( + ) = 1 + p det ( ) 1≤i;j≤p p ! =1 0 0

The convergence is obvious term-wise, only you must have an estimate to be sure the tail of the sum is small. This is easy: a finite-dimensional determinant is the signed volume of the parallelepiped spanned by the rows/columns of the matrix in hand so with |k| = p,

v     u   2 p i j Y uX i j p/  m  K ; 1 ≤ t K ; × d−2 ≤ p 2 ; det d d d d d d i;j∈k i∈k j∈k

 in which m = max |K(x; y)| : 0 ≤ x; y ≤ 1 and

    1 X i j 1 det K ; p d d d i;j∈k ! |k|=p

is dominated by

 p p/2 p 1 p/ m p m d d − ::: d − p p 2 . √ p ( 1) ( + 1) d p 1 −p by Stirling’s approximation ! p + 2 e 2π p −p/ < (me) p 2

independently of d. This is plenty. Without further ado, the main rules (1,2,3) of Section 1.5 carry over: †  Rule 1. det (I + K) = det I + K . Rule 2. det [(I + K)(I + G)] = det (I + K) × det (I + G). Rule 3 applies when det (I + K) does not vanish. Then, as will be seen in Section 2.5, I + K has a nice inverse I + G  −  − of the same type as itself, and Rule 2 implies det (I + K) 1 = [det (I + K)] 1, as it should be. Rule 4 is left aside; it’s analogue is not needed. Rule 5. ln det (I + K) = sp ln (I + K) if everything makes sense. This is proved as in Section 2.4 for small K. Take it as G R 1 G x; x dx an exercise. Here and below sp means 0 ( ) .

218 H.P. McKean

0 0  − 0 Rule 6. If K depends upon a parameter h and K = dK/dh is nice, then [ln det (I + K)] = sp (I + K) 1K . I leave it to you to see what is needed and make a proof. The function ∆(λ) = det (I + λK) of λ ∈ C is central to the rest of the discussion: it is an integral function of order ≤ 2 in view of the estimate ∞ X × r p | λ | ≤ (constant ) ≤ A Br2 ∆( ) 1 + p/ exp ( ) p=1 ( 2)! with r = |λ| and suitable constants A and B. Take this also as an exercise. The estimate can be much improved if K x; y C 1 K x ; x ( ) is smooth: for example, if it is of class , then it pays, in the estimation of det [ ( i j )]1≤i;j≤p, to subtract, from each row of index i ≥ 2, the row before. This does not change the determinant and allows a better estimate:

p/ K x ; x ≤ × p 2 det [ ( i j )]1≤i;j≤p [ an unimportant geometrical factor ] very roughly × x − x x − x ::: x − x ( 2 1)( 3 2) ( p p−n)

x < x < : : : < x why? for 1 2 p. Here, line 2 is largest when the points are equally spaced ( ), leading to the estimate

Z Z  p 1 p p p/ 1 1 1 K xi; xj d x ≤ | K xi; xj | d x ≤ p 2 × × ' ; p det [ ( )] det ( ) p p p3p/2 roughly ! x

/  whence |∆(λ)| ≤ A exp Br2 3 , i.e. ∆ is of order 2/3 as the phrase is. Note in this connection, the implication of Hadamard’s theorem (Ahlfors [1, p. 208]): that such a function of order < 1 with ∆(0) = 1 is the product Π(1 − λ/λn) over λ λ etc. its roots 1, 2, counted according to multiplicity.

Example.

K x; y x − y x ≤ y y − x x > y K x ; x Here ( ) = (1 ) if and (1 ) if . Let’s compute ∆ explicitly: det [ ( i j )]1≤i;j≤p is a polynomial x ; x ; : : : ; x p x < x < x in 1 2 p of degree 2 in each variable: for example with = 3 and 1 2 3,     x − x x − x x − x − x − x − x 1(1 1) 1(1 2) 1(1 3) 1 1 1 2 1 3 x − x x − x x − x  x − x × x − x x − x x − x  det  1(1 2) 2(1 2) 2(1 3) = 1(1 3) det  1(1 2) 2(1 2) 2(1 3) x − x x − x x − x x x x 1(1 3) 2(1 3) 3(1 3) 1 2 3     1 1 1 1 0 0 x − x × x x x  x x − x  x x − x x − x − x : = 1(1 3) det  1 2 2 = det  1 2 1 0  = 1( 2 1)( 3 2)(1 3) x1 x2 x3 x1 x2 − x1 x3 − x2

K x ; x x x x x x x Now in general, det [ ( i j )]1≤i;j≤p vanishes: in respect to 1 at 0 and 2; in respect to 2 at 1 and 3; in respect to 3 x x at 2 and 4; etc., so what could it be but a constant multiple of

x x − x x − x ::: x − x − x : 1( 2 1)( 3 2) ( p p−1)(1 p)

This constant is unity, as you will check, so

x x Z Z1 Z 3 Z 2 1 K x ; x dpx − x dx ::: x − x dx x − x x dx 1 : det [ ( i j )]1≤i;j≤p = (1 p) p ( 3 2) 2 ( 2 1) 1 1 = p! (2p + 1)! 0 0 0

In short, ∞ ∞ √ √ X λp X λ 2p+1 λ λ √1 ( ) sh√ ; ∆( ) = p = λ p = λ p=0 (2 + 1)! p=0 (2 + 1)! which is of order 1/2.

219 Fredholm determinants

2.5. Fredholm’s formulas

These implement Cramer’s rule in dimension d = ∞. They follow at once from Section 1.7 in the style of Section 2.4, K x ; x K i; j p as I ask you to check. [ ( i j )]1≤i;j≤p is abbreviated as ( ); the value of will be obvious from the context. Now

∞ 1 1 X Z Z λ λp 1 ::: K i; j dpx; ∆( ) = 1 + ∆p with ∆p = p det ( ) p ! =1 0 0

and 1 1 " #  ∂ † Z Z ∆p+1 1 K(x; y) K(x; j) p− = ∆p − ::: det d 1x ∂K(x; y) (p − 1)! K(i; y) K(i; j) 0 0 with a self-evident notation, a factor of 1/d having been dropped from both sides in comparison to Section 1.7: symbol- ically,  † p ∂ p X ∆ +1 −K q: ∂K = ∆p−q( ) q=0 Also, † ∞ 1 1 " #  ∂  X λp Z Z K x; y K x; j 1 ∆ − λ ::: ( ) ( ) dpx; λ ∂K = ∆ p det K i; y K i; j p ! ( ) ( ) =0 0 0 and  ∂ † I λK 1 ∆ λ × : ( + ) λ ∂K = ∆( ) [the identity]

I leave it to you to check this line by line from Section 1.7. † The last display shows that I + λK is invertible if ∆(λ) =6 0: Then (I + λK)(I + G/∆) = I with ∆ + G = (∂∆/∂K) /λ, and † the same applies to I + λK so that

    † † G †  G I + (I + λK) = I + λK I + = I; ∆ ∆

as it should be. But what if ∆(λ) = 0? Then I + λK kills G(·; y) for every 0 ≤ y ≤ 1 and it is desired to conclude that 1 + λK is not invertible, i.e. N =6 0. This is an automatic if G(x; y) does not vanish identically, and even if it did, there is an easy way out: For large d,   i j   I λ d K ; 1 det + ( ) d d d 1≤i;j≤d vanishes for some λ(d) close to λ, by Hurwitz’s theorem (Ahlfors [1, p. 178]), so there must be an associated null vector P e(k/d), 1 ≤ k ≤ d, which may be adjusted so that |e(k/d)|2/d = 1. Now, upon making d ↑ ∞, the sort of compactness employed in Section 2.3, item 4 produces, from

d  i  X  i j   j  e λ d K ; 1 e ≡ ; d + ( ) d d d d 0 j=1

R e x ≤ x ≤ I λK 1 e2 a genuine null function ( ), 0 1, of + with 0 = 1, and all is well. Here Fredholm’s strategy comes into focus: If λ is small, the series for ∆ + G reduces to ∆×[the standard Neumann − series for (I + λK) 1], as in

∞ p ∞ ∞ X p X q X p X q − ∆ + G = λ ∆p−q(−K) = ∆pλ (−λK) = ∆(I + λK) 1: p=0 q=0 p=0 q=0

220 H.P. McKean

P q But, at best, the sum (−λK) represents the inverse only out to the first root of ∆(λ) = 0 where it fails to exist. Contrariwise, Fredholm’s recipe, expressing the inverse as a “rational” function (∆ + G)/∆, works in the large, provided Pp q Gp p−q −K G only that you avoid roots of ∆. The trick is that the coefficients = q=0 ∆ ( ) of the series for ∆ + may be − expressed by determinants of size [(p/2)!] 1, more or less, providing a technical control not at all apparent at the start. That is the key to Fredholm’s success.

d Aside. The notation ∂∆/∂K is not fanciful. Recall that for smooth functions F : R → R, you have

 F(x + hy) = F(x) + h × the inner product ∂F/∂x • y] + o(h); which you may take as the definition of grad F = ∂F/∂x. It is the same in dimension d = ∞: A reasonable function F : C[0; 1] → R ought to satisfy

  F(f + hg) = F(f) + h × the inner product of something with g + o(h); and you declare the “something” to be grad F = ∂F/∂f(x), as in

1 Z ∂F F f hg F f h g dx o h : ( + ) = ( ) + ∂f + ( ) 0

The idea extends in an obvious way to ∆ which is a function on the space C[0; 1]2 where the kernel K(x; y) lives. You may like to check that with this understanding, the formula for ∂∆/∂K is quite correct.

Some examples.

R x 1. K x Kf x ∗ f ≡ x − y f y dy K x; y x − y + x ≤ y is convolution by , i.e. = 0 ( ) ( ) with kernel ( ) = ( ) vanishing for . Then P p Pp q P q ≡ G λ p−q −K −λK ∆ 1 and ∆ + = q=0 ∆ ( ) reduces to the Neumann series ( ) . This must be OK√ as it stands:√ p− − in fact, from x ∗ · · · ∗ x p-fold = x2 1/(2p − 1)!, it develops that (I + λK) 1 is the convolution operator I − λ sin λx ∗, as you will readily check. 2. K f 7→ ∗ f x R x f y dy K x; y is now Volterra’s operator 1 ( ) = 0 ( ) . This is outside the present rules since ( ) jumps down from 1 to 0 at the diagonal, but never mind: just take K(x; x) = 0 and hope for the best. Then ∆ ≡ 1, and

" # Z K x; y K x; j − p ( ) ( ) dpx ( 1) x − y p x > y det K i; y K i; j = p ( ) or 0 according as or not. x

p x < x x > x > x > y For example, if = 2 and 1 2, the determinant vanishes except on the figure 2 1 where it reduces to

  1 1 1   det 1 0 0 = 1; 1 1 0 and the integral is just the volume of that figure: (x − y)2/2! . Try it for p = 3. The outcome is

∞ " # X Z K x; y K x; j λp ( ) ( ) e−λ(x−y) x > y det K i; y K i; j = or 0 according as or not, 0 x <:::

− −λx i.e. (I + λK) 1 = I − λe ∗. Check this by hand.

221 Fredholm determinants

3. It is amusing to observe that if K is the even more singular operator

Z x f y f 7→ √ ( ) dy x − y 0

− R x of Abel, then (I + λK) 1 may be found by a trick due to himself: K 2f = π f, as you will check, so from (I + λK) f = g R x R x 0 Kf λπ f Kg f − λ2π f I − λK g I λ2π λ2πx ∗ you find + 0 = , i.e. 0 = ( ) , to which the inversion + exp ( ) may be applied to recover f. Check it out. What is ∆ now? 4. K(x; y) = K(y; x) = x (1 − y) if x ≤ y, as in the Example of Section 2.4. I leave it as an exercise to verify the formula

 x  √ Z √ √ Z1 √ − −   (I + λK) 1f = f − ∆ 1 × sh λ (1 − x) sh λy f(y) dy + sh λ x sh λ (1 − y) f(y) dy ;   0 x √ √ in which ∆ is the determinant det (I + λK) = (sh λ)/ λ computed in Section 2.4. Here, Fredholm’s integrals are formidable: even for p = 3 you must evaluate   K x; y K x; x K x; x K x; x ( ) ( 1) ( 2) ( 3) Z K x ; y x − x x − x x − x   ( 1 ) 1(1 1) 1(1 2) 1(1 3) dx dx dx det   1 2 3 K x2; y x1 − x2 x2 − x2 x2 − x3 x

x y < x < x < and the integral has to be split into 16 pieces according to the placement of and among the 5 points 0 1 2 x < p x < y 3 1! Let’s try = 1 only. Then, with ,

Z1 " # Zx x(1 − y) K(x; z) x(1 − y) x3 det dz = z(1 − x)z(1 − y) dz = (1 − x)(1 − y) K(z; y) z(1 − z) 6 3 0 0 y Z  y2 y3 x2 x3  − x(1 − z)z(1 − y) dz = x(1 − y) − − + x 2 3 2 3 Z1 (1 − y)3 − x(1 − z)y(1 − z) dz = xy y 3 1 1 = x3(1 − y) + x(1 − y)3; 6 6 √ √ which is, indeed, the coefficient of λ2 in sh λ x sh λ (1 − y), God be thanked!

2.6. A question of multiplicities

It is natural to ask what if any is the relation between n the dimension of the null space N of I + λK and m the multiplicity of the root of ∆(λ) = 0. So far, it is known only that n = 0 if and only if m = 0. The general rule is n ≤ m with n = 0 only if m = 0. Skip this at a first reading if you like.

Example.

n < m K f ⊗ e f ⊗ e To illustrate the possibility of , consider = 1 1 + 2 2. Taking

 1  " # Z a b G  fiej  ; = = c d 0 1≤i;j≤2

222 H.P. McKean

the question is quickly reduced to the dimension n of the null space of I + λG vs. the multiplicity m of the root, if any, of det (1 + λG). Take a = c = d = 1 and b = 0. Then ∆ = (1 + λ)2 has a double root at −1 but

" # 0 0 I − G = −1 0

 0  has only the one null vector 1 .

Proof of n ≤ m (adapted from Kato [13, p. 40]). If n = 0, there is nothing to prove, so take 1 ≤ n and let ∆ vanish at λ = 1, say, with multiplicity m ≥ 1. Obviously ∆(λ) =6 0 for nearby λ 6= 1. (I + λK) is then invertible, and you can form the operator I 1 − dλ P = − √ · (I + λK) 1 ; 2π −1 λ the integral being taken about a small circle enclosing λ = 1.

Item 1.

P is a projection, i.e. P2 = P. To confirm this, express P2 as a double integral:

I I 1 dλ 1 dµ − − P2 = √ · √ · (I + λK) 1(I + µK) 1 2π −1 λ 2π −1 µ I I   1 1 1 λ µ 1 = √ · dλ √ · dµ − ; 2π −1 2π −1 λ − µ I + λK I + µK λµ where the first/second integral is taken about the outer/inner circle seen in the picture.

− Do the inner integral first to wash out the contribution of (I + λK) 1 (λ is outside). Then do the outer integral of what remains to produce I 1 − dµ − √ · (I + µK) 1 = P: 2π −1 µ

Item 2.

P acts as the identity on the null space N of I + K: indeed, Ke = −e for e ∈ N, so

I 1 − dλ Pe = − √ (1 − λ) 1 e = e; 2π −1 λ as advertised.

223 Fredholm determinants

Item 3. √ −1 H P = − 2π −1 (G/∆) dλ/λ inherits from G a nice kernel of class C[0; 1]2 and so is compact, meaning that the image  R PB B f 1 f2 ≤ L2 ; Q d Why? of the unit null = : 0 1 is compact in [0 1]; as such, its range is of some finite dimension .( ) Now P commutes with K, so you may write K = PKP + (I − P) K (I − P) and infer that

 ∆(λ) = det (I + λK) = det (I + λPKP) × det I + λ(I − P)K(I − P) :

 Here, the second factor cannot vanish at λ = 1 since I +(I −P)K(I −P) f = 0 implies Pf = 0 and so also (I +K) f = 0, i.e. f ∈ N, and f = Pf = 0, by Item 2.

Item 4.

m (I + K) P = 0, m being the multiplicity of the root of ∆: indeed,

I   I 1 I + K dλ 1 (1 − λ)K dλ (I + K)P = − √ · − I = − √ · 2π −1 I + λK λ 2π −1 I + λK λ I   I 1 I + K dλ 1 (1 − λ)2K 2 dλ (I + K)2P = − √ · (1 − λ) K − I = − √ · 2π −1 I + λK λ 2π −1 I + λK λ

and so forth, up to

I m m I m 1 (1 − λ) K dλ 1 m m G dλ (I + K) P = − √ · = − √ · (1 − λ) K : 2π −1 I + λK λ 2π −1 ∆ λ

This vanishes as the integrand has no pole anymore.

Item 5.

m Put (I + K) P = J and keep it in mind that P commutes with K so that J = 0, by Item 4. Then

− I + λPKP = (1 − λ)(I + ΛJ) with Λ = λ(1 − λ) 1

is a d × d matrix, d being the dimension of Q the range of P. It follows that the root-bearing factor det (I + λPKP) of ∆(λ) from Item 3 can be expressed as

d det (1 − λ)(I + ΛJ) = (1 − λ) det (I + ΛJ) for λ 6= 1

But det (I + ΛJ) = 1 being a polynomial in Λ without roots: in fact, a root entails a null function, and (I + ΛJ) f = 0 only m m d if f = −ΛJf = (−Λ) J f = 0. This means that ∆ = (1 − λ) × [a non-vanishing factor], so d = m, and since P = I on N, by Item 2, so Q ⊃ N and m = dim Q ≥ n, as wanted.

2.7. Symmetric K

This is a very special case, but it’s important for all sorts of applications and so worthy to be spelled out. For simplicity K Q f R 1 fKf ≥ and because it is all I need, I take non-negative in the sense that the quadratic form [ ] = 0 is always 0.

224 H.P. McKean

Existence of eigenfunctions.

For d < ∞, the eigenvalue problem Ke = λe has just enough solutions to span the range of K and it is the same here. In keeping with the tone of this chapter, I will obtain the existence of enough eigenfunctions by the simplest means. The quadratic form vanishes identically only if K ≡ 0 (Why?). Otherwise, it is capable of positive values, as I now suppose. R   Q 1 f2 × |K x; y | ≤ x; y ≤ Q f kfk ≤ λ K is controlled by 0 max ( ) : 0 1 , so sup [ ]: 2 1 = is finite and positive. Replace d × d K i/d; j/d /d its e k/d ≤ k ≤ d by the matrix [ ( ) ]1≤i;j≤d. The top value of quadratic form in the ball of vectors ( ), 1 , Pd |e k/d |2/d ≤ λd λ d with k=1 ( ) 1 is its top eigenvalue , which is close to if is large, as I invite you to check; moreover, e e ∈ C ; d d < d < : : : ↑ ∞ you can make the associated eigenvector d approximate a function [0 1] by choice = 1 2 , by the now familiar type of compactness employed in Section 2.3. Then for d = ∞, you have Ke = λe, i.e. e is an R K λ 1 e2 λ λ e e eigenfunction of with eigenvalue and unit length: 0 = 1. Now write = 1, = 1, and look at the reduced K − λ e ⊗e e its operator 1 1 1 acting on the annihilator of 1. It may be that quadratic form is incapable of positive values. Q f λ e ; f 2;K λ e ⊗ e Q Then [ ] = 1( 1 ) = 1 1 1, and you stop. Otherwise, is capable of positive values on the annihilator e e e of 1, in which case you can repeat the procedure to produce a new unit eigenfunction 2, perpendicular to 1, with < λ ≤ λ K − Pd λ e ⊗e e eigenvalue 0 2 1, and so on. If the quadratic form of n n n n vanishes on the annihilator of n, Pd =1 n ≤ d d K ≡ λnen⊗en , the procedure stops at stage , 1 , and “enough” eigenfunctions are in hand. Otherwise, it never stops, producing an infinite number of mutually perpendicular eigenfunctions en, n ≥ 1, with diminishing eigenvalues λ ≥ λ ≥ : : : > 1 2 0.

Mercer’s Theorem.

∞ P 2 Mercer’s Theorem states that K(x; y) = n λnen ⊗ en with of the sum on the square [0; 1] ; P∞ =1 en en; f f C ; f K in particular, n=1 ( ) converges to in [0 1] for every function in the range of : that is what “enough” eigenfunctions means. The case of finite rank when the procedure stops is left aside.

Pd Proof. d < ∞ K − λnen ⊗en Fix . The reduced operator n=1 is non-negative, i.e.

1 1 " d # Z Z X f(x) K(x; y) − λnen ⊗en f(y) dx dy ≥ 0; 0 0 1 from which you learn the following.

Pd − λne2 x ≤ K x; x f h 1 × |x − y| ≤ h/ h 1) n=1 n( ) ( ) by making approximate [the indicator of 2] and taking down to 0+. P∞ R λn ≤ K x; x dx < ∞ 2) n=1 ( ) , by integration. P∞ λnen ⊗en x/y y/x 3) n=1 is now seen to converge in the whole square and, indeed, uniformly in for fixed in view of 1) and the tail estimate

2 X X X X 2 2 2 λnen(x)en(y) ≤ λnen(x) × λnen(y) ≤ e.g. K(x; x) λnen(y): n≥d n≥d n≥d n≥d

P P 4) λnen⊗en f = λnen(en; f) is likewise uniformly convergent by the tail estimate:

2 1 X X X Z 2 2 2 2 λnen(en; f) ≤ λnen (en; f) ≤ λdK(x; x) f dx n≥d n≥d n≥d 0

P 2 and the fact that λd ↓ 0 by 2); alternatively, 1) and the decrease of n≥d(en; f) to 0 do the trick; see Dini’s theorem in the appendix.

225 Fredholm determinants

P∞ f Kf − λnen en; f en n ≥ Q f ≤ λd|f |2 ↓ 5) Now the 0 = n=1 ( ) is perpendicular to , 1, and since [ 0] 0 2 0 d ↑ ∞ Q f Q f hf h as by 2), so [ 0] = 0. Therefore, [ 0 + ] is minimum (0) at = 0, and

1 d Z Z Q f hf f Kf f2 ; dh [ 0 + ] = 2 0 = 2 0 = 0 0

f ≡ i.e. 0 0. It follows that 1 ∞ Z X Kf(x) = λnen(x) en(y) f(y) dy; n 0 =1 P∞ f ∈ C ; ≤ x ≤ K x; y λnen x en y ≤ x; y ≤ for every [0 1] and every 0 1, whence ( ) = n=1 ( ) ( ) at all points 0 1, by 3).

P 2 6) Take y = x. Then n≤d λnen(x) increases with d to the continuous function K(x; x). Dini’s theorem now tells you P that this convergence is uniform. Therefore, the convergence of the sum λnen⊗en for K(x; y) must be uniform, too. Check it out.

Item 6) plus 5) is the content of Mercer’s theorem.

Expressing ∆ as a product. P Because K = λnen ⊗en, with uniform convergence in the square,

d d ∞ Y   Y   Y I λK I λ λnen ⊗en I λ λnen ⊗en λλn det ( + ) =d lim↑∞ det + ( ) =d lim↑∞ det + ( ) = (1 + ) n=1 n=1 n=1

since all the little p × p determinants entering into the series for det (I + λe ⊗ e) vanish for p ≥ 2. Here, there is no P  question as to the convergence of the product in view of 2). Besides |∆(λ)| ≤ exp r λn for r = |λ|, so ∆ is of order 1, only.

Example.

2 I go back to the symmetric kernel of Section 2.4: K(x; y) = x(1 − y) if x ≤ y. This is√ positive, being the inverse of −D acting on the domain C 2[0; 1] ∩ {f : f(0) = 0 = f(1)}, with eigenfunctions en(x) = 2 sin πnx and eigenvalues 1/n2π2. But you know the determinant of 1 + λK from Section 2.4 already, and its present evaluation may be looked upon as a proof of the standard product

∞ √ ∞ λ Y  λ2  λ Y  λ  sin − sh√ : λ = 1 n2π2 in the present form λ = 1 + n2π2 n=1 n=1

3. Fredholm applied

I collect here three applications of Fredholm determinants to: 1) Brownian motion, 2) long surface waves in shallow water, and 3) unitary matrices in high dimension. Quite a variety, no?

3.1. Brownian motion: preliminaries

The Brownian motion describes the path of, e.g. a (visible) dust note in a sun beam, impelled unpredictably, this way and that, by collisions with the (invisible) molecules of the air. Brown [2] studied it first, in Nature; later, Einstein [9] ∂p/∂t 1 ∂2p/∂x2 related it to diffusion and the underlying heat equation = 2 .

226 H.P. McKean

The mathematical -up required can be reduced to the unit interval, equipped with its standard Lebesgue measure. I suppose you know something about that. Only the language changes: a measurable set A ⊂ [0; 1] is now an “event”; the measure of A is its “probability” P(A); a measurable function x : [0; 1] → R is a “random variable”; the integral R 1 x ≡ E x 0 ( ) is its “mean-value” or “expectation”. With this lingo in place, a collection of random variables x is said to be a Gaussian family if any combination k • x = k x ::: k x k ; : : : ; k ∈ n − σ 1 1 + + n n with ( 1 n) R 0 is Gauss-distributed with mean 0 and positive root-mean-square , i.e.

b Z e−x2/2σ2 P a ≤ k • x < b √ dx a < b: ( ) = πσ for any a 2

Here, σ 2 is a quadratic form in k:

+∞ Z e−x2/2σ2 X σ 2 x2 √ dx E k • x 2 kikj E xixj ≡ k • Qk; = πσ = ( ) = ( ) −∞ 2 1≤i;j≤n

Q Q E x x being the (symmetric, positive-definite) matrix of “correlations” ij = [ ( i j )]1≤i;j≤n. Now

+∞ √ Z √ e−x2/2σ2 Ee −1k•x e −1 k•x √ dx e−σ2/2 e−k•Qk/2 = πσ = = −∞ 2

x ;:::; x is the Fourier transform of the joint distribution of 1 n. This may be inverted to obtain their joint probability x x ; : : : ; x ∈ n density as function of = ( 1 n) R :

− Z √ e−x•Q 1x/2 p x 1 e− −1 x•k e−k•Qk/2 dnk : ( ) = n = p π π n/2 Q (2 ) n (2 ) det R

You see how the determinant enters in and can imagine where all this could lead. I spell out the computation: Q can be 0 n n brought to principal axes by a rotation k → k of R . This does not change the volume element d k but simplifies the Pn 0 k • Qk λi k 2 λ Q quadratic form as in = i=1 ( i ) , the ’s being the (positive) eigenvalues of . Then, with the same rotation x → x0 , √ √ Z 0 0 P 0 Z 0 P p x 1 e− −1 x •k e− λi(ki )2/2 dnk0 1 e− −1 x •k e− λi(ki)2/2 dnk: ( ) = n = n (2π) (2π) The joint density is now seen to be the product of

+∞ √ Z 0 1 e −1 xik e−λik2/2 dk π 2 −∞ taken over 1 ≤ i ≤ n, i.e. n 0 − Y e−(xi)2/2λi e−x•Q 1x/2 p x √ ( ) = = n/ p πλi π 2 Q i=1 2 (2 ) det as advertised. The formula itself is nice, and you learn from it two important things: 1) that the statistics of a Gaussian family are completely determined by its correlations, and 2) that two such families are statistically independent if and x ;:::; x x ;:::; x only if they are uncorrelated, one with the other: indeed, if 1 p and p+1 p+q are uncorrelated, then their joint correlation Q breaks up into blocks as in the figure,

227 Fredholm determinants

− Q 1 does likewise, and the joint density splits, which is what independence is all about: if A is any event concerning xi, i ≤ p, and B any event concerning the other xj , j > p, then P(A ∩ B) = P(A) × P(B). A little trick will help to convince you that the simple space [0; 1], equipped with its Lebesgue measure, really is rich x ∈ ; x : x x ::: x n ≥ enough to such families. Let [0 1] be expanded in binary notation: = 1 2 The variables n, 1, are independent (check this!) with common distribution P(xn = 0) = P(xn = 1) = 1/2. Define

x : x x x x :::; 1 = 1 3 6 10 x : x x x :::; 2 = 2 5 9 x : x x :::; 3 = 4 8 x : x :::; 4 = 7

and so on. P x ≤ x x ≤ x ≤ You see the pattern. These new variables arex0 independent, with common distribution ( ) = , 0 1. Then x0 x R n −1/2 −x2/2 the modified variables n defined by n = −∞(2π) e dx are also independent with common probability density − / −x2/ p (2π) 1 2 e 2. Now just one more step: if Q is any n × n, symmetric, positive matrix and Q its symmetric positive 00 Pn p 0 x Q ij x ≤ i ≤ n Q root, then the family i = j=1( ) j , 1 , is Gaussian with correlation . The generalities now are ended and I turn to the (standard, 1-dimensional) Brownian motion itself. This is the Gaussian x t t ≥ E x t x t t t family ( ), 0, with correlation [ ( 1) ( 2)] = the smaller of 1 and 2. P x t ≤ t x t − x t t − t It is immediate that 1) [ (0) = 0] = 1; 2) for 1 2, the increment ( 2) ( 1) has mean-square 2 1; 3) increments < t < taken over disjoint intervals of time are uncorrelated and so also statistically independent. It follows that for 0 1 t < : : : < t x t ; x t ;:::; x t 2 n, the joint probability density of the “observations” ( 1) ( 2) ( n) is

−x2/ t − x −x 2/ t −t − xn−xn− 2/ tn−tn− e 1 2 1 e ( 2 1) 2( 2 1) e ( 1) 2( 1) p x √ ::: : ( ) = πt p π t − t p π t − t 2 1 2 ( 2 1) 2 ( n n−1)

Wiener [26] proved that the Brownian motion has continuous paths. I explain what he meant, technically. The present x · machinery cannot deal directly with the event “ ( ) is continuous” since it involves an uncountable number of observationsp of the Brownian path t 7→ x(t). What is needed is something like this: with probability 1, |x(i/n) − x(j/n)| ≤ 3 i/n − j/n for all 0 ≤ j/n < i/n ≤ 1 with sufficiently small separation max |i/n − j/n| ≤ h, independently of n ≥ 1. This permits you to complete the countable Gaussian family x(k/n), k ≤ n, n ≥ 1, to the uncountable family x(t) = limk/n↓t x(k/n), 0 ≤ t ≤ 1, with the correct joint distributions for any finite number of observations and guaranteed continuity of paths. The proof takes some work; see McKean [18] for Ciesielski’s slick version. It is natural to write the measure induced from [0; 1] onto path-space, symbolically:

∞ − R x• t 2/ dt e 0 [ ( )] 2 dP d∞x: = ∞/ (2π0+) 2

• ∞/ Here nothing is kosher: with probability 1, x (t) does not exist for any t ≥ 0; (2π0+) 2 is silly; and the flat volume ∞ element d x is a bogus, too. Nevertheless, the whole reminds you of what is happening and is even a reliable guide to computation.

228 H.P. McKean

A variant of this “free” Brownian motion figures below. This is the “tied” Brownian motion, alias the free motion, considered for times t ≤ 1 and conditioned so that x(1) = 0. The conditioning does not change the Gaussian character of the family, so only the correlation needs to be found. Here is a neat way to do that. Look at the picture.

The variables x − x t ξ x t − t x ; η x t − x t − (1) ( 1) × t − t ; x = ( 1) 1 (1) = ( 2) ( 1) − t ( 2 1) and (1) 1 1 are uncorrelated, as you will check, and so independent. Now condition by x(1) = 0. The independence of ξ and η is not x t ξ x t ξ − t / − t η spoiled, whence the (conditional) correlation between ( 1) = and ( 2) = (1 2) (1 1) + is just the unconditional E ξ2 − t / − t t − t correlation ( ) (1 2) (1 1) = 1 (1 2). A nice model of the tied Brownian motion is obtained from the free Brownian motion in the form x(t) − tx(1), 0 ≤ t ≤ 1. You have only to check (in one line) that the correlations are what they should be.

3.2. Brownian motion: some applications

Simplest examples.

 R 1  I want to compute the mean-value of F = exp − (λ/2) x2(t) dt for the free Brownian motion. The task is simplified 0  Pn  F Fn − λ/ x2 k/n /n by the continuity of the Brownian path, permitting the reduction of to = exp ( 2) k=1 ( ) with a subsequent passage to the limit n ↑ ∞. Write the joint density for x(k/n), 1 ≤ k ≤ n, as

−    exp − x • Q 1x/2 i j Qij ; ; ≤ i; j ≤ n: n/ p with = min 1 (2π) 2 det Q n n

Then

 s − Z − x • λ/n I Q−1 x/ λ/n I Q−1 1 exp (( ) + ) 2 n det ( ) +  −1/2 E(Fn) = p d x = = det (I + λQ/n) π n/2 Q Q n (2 ) det det R so that   Z1 λ 1 E exp − x2 dt = p 2 det (I + λK) 0 with Fredholm’s determinant and the (symmetric) operator

Z1 K : f 7→ min (x; y) f(y) dy: 0

229 Fredholm determinants

2 2 0 The determinant√ is easy to compute: K is the (symmetric) inverse to −D acting on C [0; 1] ∩ {f : f(0) = f (1) = 0}, with − eigenfunctions 2 sin π(n + 1/2)x and eigenvalues [π (n + 1/2)] 2, n ≥ 0, so

∞ Y  λ  √ I λK λ det ( + ) = 1 + π2 n / 2 = ch n=0 ( + 1 2)

and   Z1 √ λ −1/2 E exp − x2 dt = ch λ : 2 0

The computation for the tied Brownian motion is much the same: Now Qij = (i/n)(i − j/n) for i ≤ j; K(x; y) is the 2 2 symmetric kernel x(1 − y) if x ≤ y; the associated operator√ inverts −D acting on C [0; 1] ∩ {f : f(0) = f(1) = 0}; the − − / eigenvalues are (πn) 2, n ≥ 1; and det (I + λK) = λ 1 2 sh λ, as in Section 2.7, so now

  Z1 √ √ λ −1/2 E exp − x2 dt = λ sh λ : 2 0

P. Lévy’s area.

x t ∈ ; → 2 x x Let : [0 1] R be the plane motion of two independent copies 1 & 2 of the tied Brownian motion and consider P. Lévy’s area [17, p. 254] Z1 Z1 A 1 x dx − x dx x dx A = ( 1 2 2 1) = 1 2 = lim n 2 n↑∞ 0 0 with n− X1  k    k   k  A x x + 1 − x : n = 1 n 2 n 2 n k=0

The Brownian path is too irregular for this to be any kind of classical integral, but the limit exists with probability√ 1 and is declared to be the area. I ask you to believe this and proceed to the evaluation of the Fourier transform E exp ( −1 λA) A x x of the distribution of . The individual Brownian motions 1 and 2 are independent, so their joint probabilities split and x x you may begin by fixing 1 and integrating out 2 with the help of the general rule

− Z −x•P 1x/2 n x•y e d x y•Py/ e √ = e 2: π n/2 P n (2 ) det R

P x A y x k/n ≤ k < n Here, is to be taken as the correlation of the increments of 2 figuring in n, and is the vector 1( ), 0 . P − /n2 /n x You have ij = 1 plus an extra 1 on diagonal, as you will check, so with 1 fixed, you have

   n− n− !2 √ 2 1   1    2  h − λA i  λ X k X k  λ E e 1 n x t ≤ t ≤ − x2 1 − x 1 − y • Py : 1( ) : 0 1 = exp  1  = exp  1 n n n n  2 k=0 k=0 2

It is convenient to shift the summations above from 0 ≤ k < n to 0 ≤ k ≤ n; this makes no difference to the limit. Then, with the correlations

i  j   k  Q − ; i ≤ j; x ; ≤ k ≤ n; y; ij = n 1 n for 1 n 1 alias

230 H.P. McKean

you find − s √ Z e−x•Q 1x/2 λ2P Q−1 −1 Ee −1 λAn  e−λ2x•Px/2 dnx det ( + ) 1 ; = n/ p = Q = p (2π) 2 det Q det det (I + λ2QP) 2 in which QP = Q/n − (Q1/n ) ⊗ 1.√ This matrix approximates the operator with the familiar kernel K(x; y) = x (1 − y) minus e1⊗1 with e = K1, so E exp ( −1 λA) is 1 over the of Fredholm’s determinant

   −1  −  −1  det I + λ2(K − e⊗1) = det I + λ2K × det I − I + λ2K e⊗1 = λ 1 sh λ × 1 − sp I + λ2K e⊗1 : √ P∞ K en ⊗en/n2π2 en nπx n ≥ It remains to evaluate the trace (sp) which is not hard: = n=1 with = 2 sin for 1, so

 2 Z1 ∞ Z1 −1 −1 X 1 X 1 1 2 sh λ/2 I λ2K e ⊗ I λ2K e dx  en × − : sp + 1 = 1 + = λ2 n2π2 = 8 λ2 n2π2 n2π2 = 1 λ λ/ n + n + ch 2 0 =1 0 odd ∞  λ−1 λ Q λ2/n2π2 The last expression comes from the product sh = n=1 1 + , by logarithmic differentiation and a bit of manipulation which I leave to you. Then

 2   sh (λ/2) det I + λ2(K − e ⊗ 1) reduces to λ/2 for the final evaluation: √ − λA λ/2 E e 1 = : sh (λ/2)

Inverting.

This is all very well, but what is really wanted in these examples is the inverse transform, i.e. the probability density A π/ × −2 πx of the variable in hand. P. Lévy’s area is the simplest in this regard, the√ density being ( 2) ch ( ), as is easily − R − − λx − checked. I leave it to you: Just compute the inverse transform (2π) 1 e 1 (λ/2)[sh (λ/2)] 1 dλ for x < 0 by residues in the upper half-plane. √ √ −1/2 − / −1/2 The other inversions: of 1) ch λ and 2) λ 1 2 sh λ are more difficult. 1) was done by Cameron and Martin [3, pp. 195–209]. The present computation is more economical: √ √ ∞ − λ/2 √ √ 1 2e X n 1 · 3 · 5 · ::: · (2n − 1) − n / λ p √ = p √ = 2 (−1) e (2 +1 2) λ e−2 λ · · · ::: · n ch 1 + n=0 2 4 6 2 ∞ ∞   Z −(n+1/4)2/x X n 1 · 3 · 5 · ::: · (2n − 1) 1 e −λx = (−1) n + √ e dx · · · ::: · n 3 n 2 4 6 2 4 πx =0 0 where ∞ X · · · ::: · n − √ 1 1 3 5 (2 1) xn − x = · · · ::: · n 1 n=0 2 4 6 2 is used for the second equality and ∞ Z −x2/2t √ e −λt −x λ x √ e dt = e 2 2πt3 0 − / −x2/ t for the third one; the latter is easily deduced from the fact that p = (2πt) 1 2e 2 solves the heat equation ∂p/∂t = R ∂2p/ ∂x2 1 x2 t dt 2 . The probability density of 0 ( ) for the free Brownian motion may now be read off: ∞   −(n+1/4)2x X n 1 · 3 · 5 · ::: · (2n − 1) 1 e (−1) × n + √ : · · · ::: · n πx3 n=0 2 4 6 2 4 √ − / −1/2 It is related to one of Jacobi’s theta functions. Not very attractive, but there it is. The inversion of 2) λ 1 2 sh λ is worse, so I leave the matter here.

231 Fredholm determinants

3.3. Robt. Brown’s Brownian motion

I come to the subject by indirection. Let’s modify the symbolic expression of the free Brownian measure for “short” paths x(t), 0 ≤ t ≤ 1, by the introduction of a density function, as in

 1   1  m m2 Z Z em/2 × − x2 − x2 × 1 − 1 x• 2 d∞x: exp  (1)  ∞/ exp  ( )  2 2 (2π0+) 2 2 0 0

The exponent is of degree 2 in x, so the measure is still Gaussian, up to a normalizer to make the total mass m/ be 1. That is the role of the factor e 2: The total mass of the rest, alias the free Brownian mean-value of   R 1  exp − mx2(1)/2 − m2/2 x2 , may be reduced in the now familiar way, to the reciprocal square root of ∆ =  0 R I m2K mKe⊗e K x; y x; y e x 1 fe ≡ f det + + with ( ) = min ( ) and the unit mass at = 1, i.e. 0 (1). The symbol e⊗e is not quite kosher but don’t worry: it is only of rank 1 and will take care of itself. The computation of this determinant follows:

 −1  −1  ∆ = det I + m2K × I + I + m2K mKe⊗e = ch m × 1 + sp I + m2K mKe⊗e  −1  = ch m × 1 + I + m2K mK evaluated at (1; 1) :

−1 0 Now you need to know I + m2K ≡ I + G. Here, K is inverse to −D2 acting on C 2[0; 1] ∩ {f : f(0) = 0 = f (1)}, so G is inverse to D2/m2 − I acting on the same domain, and an easy computation produces its (symmetric) kernel

−m G(x; y) = sh mx ch m(1 − y) if x ≤ y: ch m

Then − − G I m2K 1 K I m2K 1 × 1 I m2K − I − ; + = + m2 + = m2 m so ∆ = ch m × (1 − G(1; 1)/m) = e , as wanted, i.e. the measure is normalized. But what Gaussian measure is this anyhow? The correlation

  1   m m2 Z Q em/2E − x2 − x2 x t x t = exp  (1)  ( 1) ( 2) 2 2 0

will tell, or, what is just as good, the associated quadratic form    1   1 2 m m2 Z Z m/2  x2 x2 x  (f; Qf) = e E  exp − (1) −   f   ; 2 2 0 0

m/  R  R   Z e 2 E − mx2 / − m2 1 x2/ − λ 1 fx 2/ − ∂/∂λ λ which may be obtained from = exp (1) 2 0 2 exp 0 2 by taking 2 at = 0. The computation follows the pattern for the normalizer: Z is reduced to

m/  −1/2 e 2 det I + m2K + mKe⊗e + λKf ⊗f

from which rule 6 of Section 2.4 produces

m/ 1  −3/2   −1  (f; Qf) = −2e 2 ×− det I + m2K + mKe⊗e ×det I + m2K + mKe⊗e ×sp I + m2K + mKe⊗e Kf ⊗f : 2

232 H.P. McKean

 m Now det I + m2K + mKe⊗e = e , as you already know, so line 1 is cancelled. Only line 2 remains, which is to say

−1 −1 −1 −1 Q = I + m2K + mKe⊗e K = I + I + m2K mKe⊗e I + m2K K:

−1 Here, I + m2K K = −G/m2 as before, and the simple rule

− − (I + a⊗b) 1 = I − (1 + (a; b)) 1a⊗b with a = −Ge/m and b = e produces

"  −1  # Q I − − 1 G ; − 1 Ge⊗e × − 1 G = 1 m (1 1) m m2     − 1 G x; y − 1 e−m m × − 1 G x; × − 1 G y; 1 m x e−my x ≤ y; = m2 ( ) m ch m ( 1) m ( 1) = m sh for

x t y t as you will check, only now you should put = 1 and = 2. OK. But, still, what is that? Something more tangible is wanted. A symbolic calculation will guide us: Let F(x) be any nice function of the short path x(t), 0 ≤ t ≤ 1. Then

 1  Z F x m m2 Z  Z  Z F x  Z  em/2 ( ) − x2 − x2 − 1 x• 2 d∞x em/2 ( ) − 1 x• mx 2 d∞x: ∞/ exp  (1)  exp ( ) = ∞/ exp ( + ) (2π0+) 2 2 2 2 (2π0+) 2 2 0

• • Put x + mx = y , or what is the same for x(0) = y(0) = 0,

−mt • −mt x(t) = e ∗ y = y(t) − me ∗ y; where ∗ stands for the convolution. The prior display now takes the form

 1  Z F e−mt ∗ y• Z  ∂x  em/2 ( ) − 1 y• 2 d∞y ∞/ exp  ( )  det (2π0+) 2 2 ∂y 0 with the formal Jacobian determinant of x in respect to y:

( m t −t ∂x t − m × e ( 2 1) t < t ( 1) [the identity] if 2 1, ∂y t = ( 2) 0 otherwise:

This is ambiguous on diagonal, but if you take it to be −m/2 by way of compromise, then the Jacobian determinant is −m/ e 2, and the expectation assumes its final form:

  1  2 Z m/ m m  −mt •  e 2 E F(x) exp − x2(1) − x2 = E F(e ∗ y ) 2 2 0 with the free Brownian mean-value to the right. The formula originates with Cameron and Martin [3]; it suggests that the Gaussian probabilities

 1   1  m m2 Z Z em/2 − x2 − x2 1 − 1 x• 2 d∞x exp  (1)  ∞/ exp  ( )  2 2 (2π0+) 2 2 0 0

233 Fredholm determinants

−mt • are one and the same as what the free Brownian y induces on path space by the transformation y(t) → x(t) = e ∗ y ; in fact, for these paths x,

 t  Z 1 −mt mt0 0 1 −mt E x t x t e 1 −mt e2 dt  mt e 2 t ≤ t ; [ ( 1) ( 2)] = exp 2 = m sh ( 1) for 1 2 0

as it ought to be. Take that as an exercise. Finally, I come to the “real” Brownian motion: Ornstein–Uhlenbeck [23] advocated x as a model of velocity of the Brownian motion of Robt. Brown. I change notation accordingly, from x to v. Then the position of the Brownian particle, starting from rest at 0, is Zt 0 0 x(t) = v(t ) dt 0 and you have, symbolically, Newton’s law:

d2x dv dy × −mv [mass 1] [acceleration] = dt2 = dt = + dt = [force]

in which −mv is descriptive of air resistance (drag) and dy/dt of the (thermal) agitation of the air molecules. Now intensify this agitation by a factor m, so that drag and agitation are comparable, and make m ↑ ∞. Then

d2x  dx  dy −m v m ; dt2 = dt + dt

and with x(0) = v(0) = 0, you have

r 1 − mt x(t) = y + [a negligeable error of root-mean-square] = (1 − e 2 ): 2m

In this way, Ornstein–Ulhenbeck reconciled their “real” Brownian motion x with that of Einstein [9] who took it to be the free Brownian motion y, plain.

3.4. Shallow water: background and preliminaries

The equation of Korteweg–de Vries (1895) is descriptive of the leading edge of long surface waves in shallow water; in convenient units, it reads: ∂v/∂t + v∂v/∂x = ∂3v/∂x3; v(t; x) being the velocity of the wave, taken to be independent of y as in the figure.

234 H.P. McKean

Here, you have a competition between 1) ∂v/∂t +v ∂v/∂x = 0 and 2) ∂v/∂t = ∂3v/∂x3. 1) exhibits “shocks”. 2) is “disper- sive”. I explain: 1) states that points at height h on the graph of v move to the right at speed h, so if f = v(0+; ·) is in- creasing nothing bad happens, but if it is decreasing, then high values overtake low values and pass beyond, so that v(t; ·)  ceases to be a single-valued function. The inception of this turning over is a “shock”. As to 2), v(t; x) = sin k x − k2t is a solution, showing that waves of different frequencies (k) move at different speeds (k2). This is “dispersion”. In KdV, dispersion overcomes the shocks and the solution is perfectly nice for all times t ≥ 0 and, indeed, for t ≤ 0, too. The explicit solution of KdV by Gardiner–Greene–Kruskal–Miura [11] makes a fascinating story. I cannot tell it here, but see Lamb [14] for the best introduction to what has come to be called “KdV and all that”. But one remarkable aspect of it can be told quite simply, following a beautiful computation of Ch. Pöppe [21]. Take a solution w = w(t; x) of KdV with the non-linearity v∂v/∂x crossed out, i.e. ∂w/∂t = ∂3w/∂x3, and use it to make an operator W on the half-line [0; ∞), depending upon 0 ≤ t < ∞ and −∞ < x < ∞, as in

Z∞ W : f ∈ C[0; ∞) 7→ w(t; x + ξ + η) f(η) dη: 0

Now let ∆ = ∆(t; x) be the determinant det (I + W ), introduced into the subject by Dyson [8]. Then, miraculously, v(t; x) = −(1/12) ∂2 ln ∆/∂x2 solves KdV, which is to say that this machinery w → ∆ → v converts the (easy) linear flow ∂w/∂t = ∂3w/∂x3 into the (hard) non-linear KdV flow ∂v/∂t + v ∂v/∂x = ∂3v/∂x3, but more of this below. Of course, you must take care that v makes sense: The function little w had better decay at x = +∞ so that ∆ is kosher as a Fredholm determinant, as would be the case if w(0+; ·) were smooth and rapidly vanishing at ±∞ and you took

+∞ Z √ √ w t; x 1 e− −1/kx e −1 k3t w ; k dk; ( ) = π b(0+ ) 2 −∞

√ R − kx in which b signifies the transform f 7→ e 1 f(x) dx. Then ∆ = 1 at x = +∞, but you still need to keep ∆ positive elsewhere so that ln ∆ is nice, which is to say that I + W has never any null-space. Now if f ∈ C[0; ∞) is such a (real) null-function, then the rapid vanishing of w at +∞ imparts the same decay to f itself, and with the understanding that f(ξ), vanishes for ξ ≤ 0, you may conclude from

Z √ √ √ 1 − − kξ − − kx − k3t f(ξ) + e 1 e 1 e 1 wb(0+; k)bf(−k) dk = 0 2π that Z∞ Z+∞ Z+∞ 1 1 f2 = |bf|2 = |wb|2|bf|2 dk 2π 2π 0 −∞ −∞ ∗ in view of bf(k) = bf(−k), f being real. This cannot be maintained if |wb| < 1 unless bf, and so also f, vanishes: in fact, |wb| ≤ 1 suffices and is necessary, as well, but that is a little longer story which I omit. This ends the preparations.

3.5. Pöppe’s computation

00 0 Pöppe showed, by direct manipulation of determinants, that −(1/12)(ln ∆) solves KdV. The simpler function V ≡ −(ln ∆) − is better suited to the proof. It is − sp (1 + W ) 1DW with D = d/dx, by rule 6 of Section 2.4, and may be reduced to  −  (I + W ) 1W (0; 0)/2 by means of the “Hankel” character of W , by which I mean that its kernel is a function of ξ + η −  −  alone: in fact, the kernel of (I + W ) 1DW is (∂/∂η) (I + W ) 1W (ξ; η), so

− 1  − −  V = − sp (I + W ) 1DW = − sp DW (I + W ) 1 + (I + W ) 1DW 2 Z∞ 1  − −  1 d  −  1  −  = − sp D(I + W ) 1W + (I + W ) 1DW = − (I + W ) 1W (ξ; ξ) dξ = (I + W ) 1W (0; 0): 2 2 dξ 2 0

235 Fredholm determinants

0 00 000 0 • 0 I will need V ;V ;V ; (V )2, and also V . The first three are easy, as is the last: with the notation W = DW etc. and abbreviating [ ](0; 0) by [] plain, you find

0 1  − 0 − − 0 1  − 0 −  V = − (I + W ) 1W (I + W ) 1W + (I + W ) 1W = (I + W ) 1W (I + W ) 1 ; 2 2 00  − 0 − 0 −  1  − 00 −  V = − (I + W ) 1W (I + W ) 1W (I + W ) 1 + (I + W ) 1W (I + W ) 1 ; 2 000  − 0 − 0 − 0 −   − 00 − 0 −  V = 3 (I + W ) 1W (I + W ) 1W (I + W ) 1W (I + W ) 1 − 3 (I + W ) 1W (I + W ) 1W (I + W ) 1 1  − 000 −  + (I + W ) 1W (I + W ) 1 ; 2

and

• 1  − 000 −  V = (I + W ) 1W (I + W ) 1 : 2

0 The reduction of (V )2 to like form employs the Hankel trick again: Let the operators A; B; C and D have nice kernels and good decay at +∞, abandoning temporarily the prior meaning of D, and let B and C be Hankel besides. Then, with an obvious notation, ∞ ∞ Z d Z d AB ; CD ; − AB ; ξ × CD ξ; dξ − AB ; ξ × CD ξ; dξ [ ](0 0)[ ](0 0) = dξ [ ](0 ) [ ]( 0) [ ](0 ) dξ [ ]( 0) 0 0 Z∞ Z∞ 0 0 0 0 = − [AB ](0; ξ)[CD](ξ; 0) dξ − [AB](0; ξ)[C D](ξ; 0) dξ = −[A(B C + BC )D](0; 0): 0 0 0 Now write 2V in its initial form:

0  − 0  − 0 −  2V = (I + W ) 1W − (I + W ) 1W (1 + W ) 1W ;

dropping the (0; 0) as before, and square:

0  − 0 0 −   − 0 − 0 −  4(V )2 = (I + W ) 1W (W (I + W ) 1 − 2 (I + W ) 1W W (I + W ) 1W (I + W ) 1  − 0 −  − 0 −  + (I + W ) 1W (I + W ) 1W W (I + W ) 1W (I + W ) 1 ;

0 − in which the symmetric character of W;W , and (I + W ) 1 is used to manipulate the second factors one by one. Then the ABCD rule to produce

0  − 00 0 0 00 −   − 00 0  − 0 −  4(V )2 = − (I + W ) 1 W W + W W (I + W ) 1 + 2 (I + W ) 1 W W + (W )2 (I + W ) 1W (I + W ) 1  − 0 − 0 0 − 0 −  − (I + W ) 1W (I + W ) 1 W W + WW (I + W ) 1W (I + W ) 1

• 000 0 and write out V − V + 6(V )2 in the present notation:

• 000 0 V − V + 6(V )2 1  − 000 −  = (I + W ) 1W (I + W ) 1 (1) 2  − 0 − 0 − 0 −  − 3 (I + W ) 1W (I + W ) 1W (I + W ) 1W (I + W ) 1 −(2)  − 00 − 0 −  + 3 (I + W ) 1W (I + W ) 1W (I + W ) 1 (3) 1  − 000 −  − (I + W ) 1W (I + W ) 1 −(4) 2  − 00 0 −  − 3 (I + W ) 1W W (I + W ) 1 −(5)  − 00 −  0 −  + 3 (I + W ) 1W I − (I + W ) 1 W (I + W ) 1 (6) − (7)  − 0 − 0 −  + 3 (I + W ) 1(W )2(I + W ) 1W (I + W ) 1 (8)  − 0 − 0 −  0 −  − 3 (I + W ) 1W (I + W ) 1W I − (I + W ) 1 W (I + W ) 1 −(9) + (10)

236 H.P. McKean

in which (1) and (4) cancel, also (5) and (6), (3) and (7), (8) and (9), (2) and (10), i.e. the whole thing vanishes. 0 • 000 0 Remarkable! Now take v = V . Then v − v + 12vv = 0 which is KdV aside from the nuisance factor 12, and this is removed by changing v into v/12.

Example: solitons.

Take, as solution of ∂w/∂t = ∂3w/∂x3, the function n X 3  w(t; x) = mi exp − kix − ki t i=1 00 with positive numbers m and distinct positive k’s. Then v = −(1/12)(ln ∆) solves KdV. Here, ∆ can be expressed pretty explicitly. Let’s look for eigenfunctions of W at t = 0: Z∞ X −kiξ −kiη W : f 7→ mi e e f(η) dη 0

Pn −kj ξ f ξ fj e W f λf so any such eigenfunction is of the form ( ) = j=1 , and a self-evident computation shows that = is the same as to say n X mi λfi = fj for 1 ≤ i ≤ n; ki kj j=1 + i.e. ∆ = det (I + W ) is the ordinary determinant   mi det I + ; ki kj + 1≤i;j≤n which may be spelled out in Fredholm’s way as

n  n  2 X X m Y ki − kj 1 + ; k ki kj p |n| p 2 i

n n n n ≤ n < n < ··· < n ≤ n m/ k m / k 1 × m / k 2 × : in which is a collection of indices 1 1 2 p and ( 2 ) means ( 1 2 1) ( 2 2 2) etc I leave this to you as an exercise with a hint to be verified first:    1 Y Y det = (ki − kj )2 (ki + kj )2: ki kj + 1≤i;j≤n i

00 Omitting the corrective 1/12, the corresponding solution of KdV is then v(t; x) = −(ln ∆) with  m  i −kix −k3t ∆ = det I + e e i : ki kj + 1≤i;j≤n

For n = 1, ∆ is simply 1 + (m/2k) exp (−kx − k3t), and

2  3  k − kx k t 1 2k v(t; x) = − ch 2 + + ln ; 4 2 2 2 m as you will check. This is the famous “soliton” of “KdV and all that”, representing, e.g. a solitary wave (tsunami) in the open ocean, traveling east to west; compare Lamb [14]. For n ≥ 2, v is a non- of such solitons. Here is why: The soliton has amplitude k2/4 and speed k2, i.e. big solitons go fast, small solitons go slow. Now you can start off a solution of KdV by placing widely-spaced solitons in order of diminishing amplitude/speed, right to left. They 0 hardly feel each other: the individual soliton dies off fast from its peak value, so the non-linearity vv does not count for much, and you can simply add the n solitons together. Now turn on KdV: The big solitons catch up to the little ones, and a complicated (non-linear) interaction takes place, but by and by you see the original solitons emerging to the left, moving off to −∞ in order of increasing amplitude/speed, left to right, with a little evanescent “radiation” in between. 00 It is this evolution that v = −(ln ∆) describes. I omit the computation, but try it for n = 2.

237 Fredholm determinants

Coda.

Now the only trouble is that you don’t know how to pick w(0+; ·) to produce the initial velocity profile v(0+; ·) you want. That is a pretty long story, how it’s done. Here, I am after determinants and just refer you to Lamb [14] and Dyson [8] for the full story. It involves an elaborate detour via a sort of non-linear Fourier transform (“scattering”), quite remarkable in itself but, to my mind, a little wrong-headed. If only you could invert the map w(0+; ·) → v(0+; ·) directly, in real • space√ so to say, then all would be well. I like to make this caricature: To solve x = 1 is easy even if you know nothing; • − R x − / x − x2 y 1 x − r2 1 2 dr = 1 looks harder, but if you have the wit to introduce the special function = sin = 0 (1 ) , then • y = 1, x = sin y, and you’re back in business. The map v → w is just a variant of this device, i.e. it is an ∞-dimensional 00 “special function”, inverse to w → v = −(ln ∆) . How bad can it be? The question merits a direct answer.

3.6. The unitary ensemble: background and preparations

The idea to model atomic energy levels by the spectra of random matrices is due to Wigner [27]. The subject has been much developed since then; see Mehta [19] for the best general account. Curiously, there even seems to be a mysterious connection to Gauss’s prime number theorem, # {primes ≤ n}' n/ lg n, mediated by Riemann’s zeta function, for which see Conrey [4]. The group G of N×N unitary matrices provides the simplest example of what people do. G is equipped with its (one and only) translation invariant volume element, normalized to make the total volume equal√ 1. That is the − θ only natural way to introduce probabilities into the picture, and it is the statistics of the eigenvalues e 1 of a “typical” unitary matrix which is sought, especially in high dimensions N ↑ ∞. The pretty fact is that these statistics can be N < ∞ expressed by means√ of Fredholm determinants: for example, for , the probability that no eigenvalue falls in the  − θ circular arc e 1 : α ≤ θ ≤ β is   1 sin N(θ − ')/2 det I − : 2π sin (θ − ')/2 α≤θ;'≤β

The discussion is adopted from Tracy–Widom [22], more or less.

The invariant volume. √ D ⊂ G e e −1 θ I follow√ Weyl [24, pp. 194–198]. Let be the commutative subgroup of diagonal unitaries = , by which I − θn mean e 1 , 1 ≤ n ≤ N, on diagonal.

The figure depicts G fibered by D over the base G/D. To understand what is meant, take a general element g ∈ G − and bring it, as you may, to diagonal form by a unitary similarity: g = heh 1 with e ∈ D and h ∈ G. Here, h can 0 be multiplied from the right by any e ∈ D without changing g, so it is only the coset hD that counts. In fact, for g in − “general position”, both this coset and the covering point e figuring in g = heh 1 are fully determined in the small by the requirement that e not change much in response to a little change in g – e is made from the eigenvalues of g which can only be permuted, and for g in general position, such a permutation is a big change, so if now h is determined at

238 H.P. McKean

− the base, then also e is known, and likewise in a small patch about g = heh 1, with e and h changing only a little, by de and dh in response to a little change dg in g. Obviously √ − − − − − − − − − −  − g 1 dg = he 1h 1 dh eh 1 + h de h 1 − heh 1 dh h 1) = h e 1h 1 dh e − h 1 dh + −1 dθ h 1; in which you may reduce the apparent supernumerary (real) degrees of freedom (N2 for dh plus N more for dθ) 2 −1 to the correct√ number (N ) by the observation that h dh is pure imaginary on diagonal, permitting the restriction − − h 1dh = −1 dθ there. The volume element Ω is now produced by wedging the several 1-forms g 1dg up to top dimension. Each of these is insensitive to left translations, so Ω inherits this feature. Besides, the associated line element  − †∗ −   †∗  sp (g 1dg) g 1dg = sp (dg) dg ≡ |dg|2

−1 is insensitive to the presence√ of h to the left and h to the right, so these can be dropped and the bare 1-forms − − − e 1h 1dhe − h 1d = h + −1 dθ wedged up to top dimension to produce the volume element:   Y ej − dθ dθ ::: × ω G/D ; Ω = 1 1 2 [a volume element on ] ei i=6 j or, taking the factors of the product by pairs, Y Ω = |ei − ej |2dθ × ω: i

The spectral density.

The form of ω does not matter anymore. What is wanted is the volume element induced upon the eigenvalues:

√ √ − Y − θi − θj 2 Y Z 1 × e 1 − e 1 dθk ≡ p(θ) dθ; i

Z in which is a normalizing factor, to be determined presently,√ fixing the total volume at 1 so that these are probabilities.  − i− θ  Q e 1 ( 1) j To put them in a better form, introduce the matrix = 1≤i;j≤N with the Vandermonde-type determinant:

Y √ √ e −1 θj − e −1 θi : i

Then √ √ Y − θj − θi 2 † ∗ † ∗ e 1 − e 1 = det Q × det Q = det Q Q ; i

239 Fredholm determinants

Now

"N− # N− N− N− N− X1 √ X1 X1 h √ i X1 X1 h √ i e −1(θi−θj )k ::: e −1 (θi−θj )ki ::: e −1 θi(ki−kj ) ; det = det ≤i;j≤N = det ≤i;j≤N k k k 1 k k 1 =0 1≤i;j≤N 1=0 2=0 1=0 2=0

and as the latter determinant vanishes unless the k’s are distinct, so Z reduces to the sum over such k’s of

N X χ π Z Y √ ( ) e −1 θi(ki−kπi) dθ: π N π (2 ) i=0

But all these integrals vanish unless π is the identity, so the whole reduces to the number of choices of N distinct k’s from 0; 1;:::;N − 1, i.e. Z = N! . p θ ; : : : ; θ θ The same style of computation produces a nice formula for the marginal density ( 1 n) with the other angles k , N ≥ k > n, integrated out: N − n p θ ; : : : ; θ ( )! K θ − θ  : ( 1 n) = det ( i j ) ≤i;j≤n N! 1 I leave that to you as an exercise.

3.7. Occupation numbers, determinants and the scaling limit

Now Fredholm determinants enter the story. Let # be the number of angles θ that fall within an interval J ⊂ [0; 2π). It is desired to evaluate the probabilities P(# = n) for n ≥ 0. Everything is symmetric in the angles and there are “N choose n” ways of specifying which particular angles fall in J, so

    n N N \ \ 0 P n P  Ei E  (# = ) = n j i=0 j=n+1

0 in which Ei is the event “θi ∈ J” and Ej the compliment of Ej , i.e. “θj ∈/ J”. This yields to the principle of inclusion- Tn Ei E exclusion: with i=0 = for short, you have

 N   N  \ 0 [ P E ∩ Ej  = P(E) − P E ∩ Ej  j=n+1 j=n+1 X X X = P(E) − P(E ∩ Ei) + P(E ∩ Ei ∩ Ej ) − P(E ∩ Ei ∩ Ej ∩ Ek ) + ::: i>n j>i>n k>j>i>n N X  N − n  − p−n PE ∩ E ∩ ::: ∩ E  = ( 1) p − n n+1 p by symmetry p=n N   Z X p−n N − n (N − p)!   − K θi − θj dθ; = ( 1) p − n N det ( ) 1≤i;j≤p p n ! = Jp

which reduces, after multiplication by “N chose n”, to

N p−n Z X (−1)   P n K θi − θj dθ: (# = ) = n p − n det ( ) 1≤i;j≤p p n !( )! = Jp

240 H.P. McKean

n Here is the punch-line: multiply by (1 − λ) , sum from n = 0 to N, and reverse the two summations to produce

N N " p # X X X p Z − λ nP n − λ n − p−n ! 1 K θ − θ  dθ (1 ) (# = ) = (1 ) ( 1) det ( i j ) ≤i;j≤p n!(p − n)! p! 1 n=0 p=0 n=0 Jp N X −λ p Z ( ) K θ − θ  dθ I − λ K J × J ; = det ( i j ) ≤i;j≤p = det ( restricted to ) p! 1 p=0 Jp

n   K being of rank N, i.e. P(# = n) is the coefficient of (1 − λ) in det I − λ (K restricted to J × J) . < θ∗ < θ∗ < : : : θ∗ − θ∗ π/N Now let 0 1 2 be the angles taken in order. Obviously, the mean-value of 2 1 is 2 , by symmetry, so if you want to see what happens for infinite dimensions, i.e. for N ↑ ∞, you’d better scale the angles up by the factor N/2π, as in

 n P #[0; 2πL/N) = n = the coefficient of (1 − λ) in πL/N πL/N N p 2Z 2Z   X (−λ) 1 sin N(θi − θj )/2 ::: det dθ p π θi − θj / p ! 2 sin ( ) 2 1≤i;j≤p =0 0 0 L L N p Z Z   X (−λ) 1 sin π(xi − xj ) = ::: det dx p π N/ π π xi − xj /N p ! 2 ( 2 ) sin ( ) 1≤i;j≤p =0 0 0 L L ∞ p Z Z   X (−λ) sin π(xi − xj ) ' ::: det dx for large N p π xi − xj p ! ( ) 1≤i;j≤p =0 0 0  π x − y  I − λ sin ( ) I − λK = det π x − y = det ( ) ( ) 0≤x;y≤L with sin π(x − y) K(x; y) = restricted to 0 ≤ x; y ≤ L: π(x − y) It follows that P ; πL/N n Nlim↑∞ #[0 2 ) = exists for each n ≥ 0, and it is a source of satisfaction that these numbers add to unity, i.e. they represent a genuine probability distribution. This is because det (I − λK) reduces to 1 at λ = 0.

Jimbo et al. [12] discovered the surprising fact that ∆ = det (I − λK), as a function of L, solves a differential equation – and a famous one too, called Painlevé 5 – but that lies outside the scope and purpose of these notes. A nice account may be found in Tracy–Widom [22]. Here I note only one particular consequence: For λ = 1, ∆(L) is just the probability that  no (scaled) angle θ is found in the interval between 0 and L, and this has a Gaussian tail, to wit, ∆(L) ' exp − π2L2/8 .

Appendix on compactness

The space C[0; 1]. F ⊂ C d f ; f kf − f k The statement is that the figure is compact relative to the distance ( 1 2) = 1 2 ∞ if (and only if) it is 1) closed, 2) bounded, and 3) |f(b) − f(a)| ≤ ω(|b − a|) for every f ∈ F with some universal, positive, increasing function ω(h), h > 0, subject to ω(0+) = 0. I sketch the proof.

241 Fredholm determinants

Sufficiency.

r ; r ;::: ≤ r ≤ f r Let 1 2 be a list of the (countably many) rational points 0 1. The numbers n( m) are bounded for fixed m ≥ n n ; n ;::: n ; ; ;::: f r n ∈ n 1, so you can pick 1) a subsequence 1 = ( 1 2 ) of = (1 2 3 ) to make n( 1), 1, tend to some f r n n ; n ;::: ⊂ n f r n ∈ n f r number ∞( 1), 2) a further subsequence 2 = ( 21 22 ) 1, to make n( 2), 2, tend to some number ∞( 2), d n ; n ; n ;::: f r f r and so on, and if you now pass to the diagonal = ( 11 22 33 ), then d( m) will tend to ∞( m) for every m = 1; 2; 3;::: simultaneously. 3) does the rest.

Necessity.

3) states that  sup |f(b) − f(a)| : f ∈ F; 0 ≤ a < b ≤ 1; (b − a) ≤ h ↓ 0 with h:

Its negation asserts the existence of a number ω > 0 of this nature: that for every n ≥ 1, you can find fn ∈ F and 0 ≤ an < bn ≤ 1 with bn − an ≤ 1/n so as to make |fn(bn) − fn(an)| ≥ ω. But if F were compact, you could weed out to make fn → f∞ and, at the same time, make an & bn tend to a common value c, [0; 1] being compact. Then 0 < ω ≤ |fn(bn) − fn(an)| tends to 0, which is contradictory. Dini’s Theorem, used twice in Section 2.7, belongs to the same circle of ideas. It states that if fn ∈ C[0; 1] decreases to 0, then fn → 0, i.e. kfnk∞ ↓ 0. Otherwise, for each n ≥ 1, fn(xn) exceeds some fixed number ω > 0 for some 0 ≤ xn ≤ 1. Then you can weed out to make xn tend to x∞ as n ↑ ∞, with the contradictory result that

fm x∞ fm xn ≥ fn xn ≥ ω; ( ) =n lim↑∞ ( ) nlim↑∞ ( )

i.e. fm(x∞) does not decrease to 0.

2 The space L [0; 1].  R B e 1 e2 ≤ } L2 Now comes the “weak” compactness of the unit ball = : 0 1 . The key to this is the fact that contains a countable dense family of functions fn, n ≥ 1, such as the broken lines with corners at rational places and rational slopes between. Take en, n ≥ 1, from B. The diagonal method employed above permits you to weed out so as to make (en; fm) n ↑ ∞ L f m ; ; ;::: |L f − L f | ≤ kf − f k tend, as , to some number ( m) for every = 1 2 3 , simultaneously. Obviously, ( i) ( j ) i j 2, whence L extends to a bounded linear map from L2[0; 1] to R and (en; f) tends to the extended L(f) for every f ∈ L2, simultaneously. The final step employs the fact that any such L is of the form L(f) = (e∞; f) with a fixed function o e∞ ∈ L2[0; 1]. The proof is easy. Let N be the null space of L and N its annihilator. Then either L(f) ≡ 0, i.e. en 0 or L e 6 e ∈ No e e e ∈ No L L e e −L e e else ( ) = 0 for any non-vanishing . Pick such an = 1. Then for any other 2 , sends ( 1) 2 ( 2) 1 No No e kek f to 0 from which you infer that dim = 1, i.e, = R with 2 = 1 if you want. Now split into its projection o e(e; f) onto N plus the rest, viz. f − e(e; f) ∈ N. Then L(f) = L(e)(e; f) ≡ which is to say en L(e) × e. This is the weak compactness of the unit ball. That’s all you need to know about compactness, but I will add a simple

Exercise. √ The functions en = 2 sin nπx, n ≥ 1, form a unit perpendicular base of L2[0; 1], i.e. every f ∈ L2 can be expanded as P  P 2 2 2 f = ˆf(n) en with ˆf(n) = (en; f). Prove that the ellipsoid E = f : |ˆf(n)| /ln ≤ 1 is (truly) compact in L [0; 1] if and 2 only if the positive numbers ln tend to 0 as n ↑ ∞.

242 H.P. McKean

References

[1] Ahlfors L.V., , Internat. Ser. Pure Appl. Math., 3rd ed., McGraw-Hill, New York, 1979 [2] Brown R., A brief account of microscopical observations made in the months of June, July, and August, 1827, on the particles contained in the pollen of plants, etc., Philosophical Magazine, 1828, 4, 161–173 [3] Cameron R.H., Martin W.T., The Wiener measure of Hilbert neighborhoods in the space of real continuous functions, Journal of Mathematics and Physics Mass. Inst. Tech., 1944, 23, 195–209 [4] Conrey J.B., The Riemann hypothesis, Notices Amer. Math. Soc., 2003, 50(3), 341–353 [5] Courant R., Differential and Integral Calculus, vol. 2, Wiley Classics Lib., John Wiley & Sons, New York, 1988 [6] Courant R., Hilbert D., Methoden der Mathematischen Physik, Springer, Berlin, 1931 [7] Dyson F.J., Statistical theory of the energy levels of complex systems. I, J. Mathematical Phys., 1962, 3, 140–156 [8] Dyson F.J., Fredholm determinants and inverse scattering problems, Comm. Math. Phys., 1976, 47(2), 171–183 [9] Einstein A., Über die von der molekularkinetischen Theorie der Wärme gefordete Bewegung von in ruhenden Flüs- sigkeiten suspendierten Teilchen, Ann. Phys., 1905, 17, 549–560; reprinted in: Investigations on the Theory of the Brownian Movement, Dover, New York, 1956 [10] Fredholm I., Sur une classe d’équations fonctionelles, Acta Math., 1903, 27(1), 365–390 [11] Gardner C.S., Greene J.M., Kruskal M.D., Miura R.M., Methods for solving the Korteweg–deVries equation, Phys. Rev. Lett., 19(19), 1967, 1095–1097 [12] Jimbo M., Miwa T., Môri Y., Sato M., Density matrix of an impenetrable Bose gas and the fifth Painlevé transcendent, Phys. D, 1980, 1(1), 80–158 [13] Kato T., A Short Introduction to Perturbation Theory for Linear Operators, Springer, New York–Berlin, 1982 [14] Lamb G.L., Jr., Elements of Soliton Theory, Pure Appl. Math. (N. Y.), John Wiley & Sons, New York, 1980 [15] Lax P.D., Linear Algebra, Pure Appl. Math. (N. Y.), John Wiley & Sons, New York, 1997 [16] Lax P.D., Analysis, Pure Appl. Math. (N. Y.), John Wiley & Sons, New York, 2002 [17] Lévy P., Le Mouvement Brownien, Mémoir. Sci. Math., 126, Gauthier-Villars, Paris, 1954 [18] McKean H.P.,Jr., Stochastic Integrals, Probab. Math. Statist., Academic Press, New York–London, 1969 [19] Mehta M.L., Random Matrices, 2nd ed., Academic Press, Boston, 1991 [20] Munroe M.E., Introduction to Measure and Integration, Addison–Wesley, Cambridge, 1953 [21] Pöppe Ch., The Fredholm determinant method for the KdV equations, Phys. D, 1984, 13(1-2), 137–160 [22] Tracy C.A., Widom H., Introduction to random matrices, In: Geometric and Quantum Aspects of Integrable Systems, Scheveningen, 1992, Lecture Notes in Phys., 424, Springer, Berlin, 1993, 103–130 [23] Uhlenbeck G.E., Ornstein L.S., On the theory of the Brownian motion, Phys. Rev., 1930, 36, 823–841 [24] Weyl H., The Classical Groups, Princeton University Press, Princeton, 1939 [25] Whittaker E.T., Watson G.N., A Course of Modern Analysis, 4th ed., Cambridge University Press, New York, 1962 [26] Wiener N., Differential space, Journal of Mathematics and Physics Mass. Inst. Tech., 1923, 2, 131–174 [27] Wigner E., On the distribution of the roots of certain symmetric matrices, Ann. of Math., 1958, 67, 325–326

243