Fredholm Determinants

Home , Fredholm determinant, Function space

Cent. Eur. J. Math. • 9(2) • 2011 • 205-243 DOI: 10.2478/s11533-011-0003-5

Central European Journal of Mathematics

Invited Review Article

Henry P. McKean1

1 CIMS, 251 Mercer Street, New York, NY 10012 USA

Received 15 November 2010; accepted 16 January 2011

Abstract: The article provides with a down to earth exposition of the Fredholm theory with applications to Brownian motion and KdV equation.

MSC: 46E20, 35Q53

Keywords: Fredholm detemninant • Brownian motion • KdV equation © Versita Sp. z o.o.

Preface

The work of I. Fredholm [10] is one of the loveliest chapters in mathematics, coming just at the turn of the century, when serious attention was first given to analysis in infinite-dimensional (function) space. Nowadays, it is explained mostly in an abstract and, if I may say so, a superficial way. So I thought it would be nice to do it from scratch, concretely, following Fredholm in using only the simplest methods and adding a few illustrations of what it is good for. The treatment has much in common with Peter Lax [16] which you may wish to look at, too; see also Courant–Hilbert [6] and/or Whittaker–Watson [25], and, of course Fredholm’s 1903 paper, itself. Cramer’s rule expresses the inverse of a square d matrix by ratios of sub-determinants, as I trust you know. There, the dimension is finite. What Fredholm did was to adapt this rule to infinite dimensions, i.e. to function space. Poincaré had posed the problem, giving as his opinion that it would not be solved in his lifetime. Fredholm did it just two years later. Chapter 1 is to remind you about determinants and Cramer’s rule in d < ∞ dimensions, first in the conventional form and then in Fredholm’s form which is specially adapted to the passage from d = 1; 2; 3;::: to d = ∞. Obviously, if this is to work, you want only matrices which are close enough to the identity, i.e. something like I + K where I is the identity, K is “small”, and you need to express det (I + K) as 1 + [terms in K;K 2;::: ] under effective control. This is a matter of bookkeeping. Look at a I + λK with a variable λ, develop its determinant in powers of λ (it is a polynomial after all), and put λ back equal to 1. This produces the formula   " # Kii Kij Kik X X Kii Kij X   det (I + K) = 1 + Kij + det + det Kji Kjj Kjk + ::: + det K; Kji Kjj   1≤i≤d 1≤i

205 Fredholm determinants

which is the key to the passage to d = ∞ carried out in Chapter 2. Think now of K as an integral operator

Z1 K : f 7→ K(x; y)f(y) dy; 0

acting on C[0; 1] say, and approximate its action by that of the matrix

i j K ; 1 n n n 1≤i;j≤d

acting in dimension d. Then the preceding formula for the determinant of the matrix I + K makes plain the good sense of Fredholm’s recipe for the determinant of the operator I + K:

" # Z Z K x ; x K x ; x I K K x; x dz ( 1 1) ( 1 2) d2x det ( + ) = 1 + ( ) + det K x ; x K x ; x ( 2 1) ( 2 2) ≤x≤ ≤x

Chapter 2 develops this idea along with Cramer’s rule, in detail. (Don’t worry. I’ll make a better notation when the time comes.) Particular attention is paid to the special but important case of symmetric kernels: K(x; y) = K(y; x); of these I give only the simplest examples. Chapter 3 presents three serious examples of what you can do with this machinery. The ﬁrst application is to Gaussian probabilities with emphasis on Brownian motion, as initiated by Cameron and Martin [3]. The second is to the Kortweg– de Vries equation (KdV) describing long waves in shallow water. Here, I follow mostly Ch. Pöppe’s [21] explanation of a formula of Dyson [8], representing the solution of KdV by means of a special type of Fredholm determinant. The third application is to the N-dimensional unitary ensemble, so-called, i.e. the group of N×N unitary matrices equipped with its invariant volume element. What is in question is the distribution of eigenvalues of a typical matrix both for N < ∞ and for N = ∞. Here, the principal names are Dyson [7], Jimbo et al. [12], Mehta [19], and Tracy–Widom [22]. I cannot close this preface without a little joke. I gave a lecture in Paris a while back: something about “Fred Holm’s determinants” the advertisement said. Oh well! The name is Ivar. Don’t forget it.

Prerequisites

Not much in fact. You will need: 1) standard linear algebra, both real and complex; 2) calculus in several variables, of course; 3) some acquaintance with Lebesgue measure (on the unit interval only) and the associated space L2[0; 1] of (real) R 1 f2 < ∞ measurable functions with 0 ; 4) elementary complex function-theory. I will explain anything non-standard or more advanced. Good references are: for 1), Lax [15]; for 2), Courant [5]; for 3), Munroe [20]; for 4), Ahlfors [1].

1. An old story with a new twist

d I collect here what you need to know about linear maps of the ﬁnite-dimensional space R to itself, ﬁrst in the conventional format and then in Fredholm’s way.

206 H.P. McKean

1.1. Simplest facts

d d d L : R → R is the map, R = y : y = Lx for some x ∈ R is its range, N = {x : Lx = 0} its null space, and similarly † † † for the transposed map L : range R and null space N . The ﬁgure displays these notions in a convenient way.

† o o † The position of N to the right indicates that it is the annihilator R of R: Indeed, y ∈ R if and only if (Lx; y) = (x; L y) † † o † vanishes for every x, which is to say L y = 0, and similarly to the left: N = (R ) . L kills N and so maps R one- † † † † to-one onto R; likewise, L kills N and maps R one-to-one onto R . It follows that dim R = dim R ≡ r and so also † N N ≡ n d−r L x → y M mij dim = dim = . The map : may be expressed by a matrix = 1≤i;j≤d relative to the standard frame: e ::: ; e ::: ; : : : ; e ::: 1 = (100 0) 2 = (010 0) d = (000 1) as in d X yi = mij xj ; i = 1; : : : ; d: j=1

† † R is the span of the columns of M, R the span of its rows, and dim R = dim R expresses the equality of “row rank” and “column rank”.

1.2. Determinant and Cramer’s rule

d d It is desirable to have a practical way to tell if L : R → R is one-to-one and onto (and so invertible), and also a recipe − for its inverse L 1 in this case. Introduce (out of the air) the determinant:

d X Y M π m ; det = sign iπ(i) π∈Sd i=1 the sum being taken over the “symmetric group” Sd of all permutations π of the “letters” 1; 2; : : : ; d. Then L is one-to-one −1 −1 −1 M 6 L M mij and onto if and only if det = 0, in which case is expressed by the inverse matrix = 1≤i;j≤d formed according to Cramer’s rule: 1) strike out the ith row and jth column of M and take the (d − 1) × (d − 1) determinant of what’s left, this being the ij i j “cofactor” of M, so-called; 2) supply the signature (−1) + ; 3) transpose; 4) divide by det M. More brieﬂy,

−1 −1 i+j mij = (det M) (−1) × the ji cofactor of M :

This is, I trust, familiar. If not, don’t worry. I will derive it all in a geometrically appealing way in the next three sections.

207 Fredholm determinants

1.3. Signed volume

The determinant, which seems at ﬁrst so complicated and unmotivated, has a simple geometrical meaning: It is the V f Le ; : : : ; f Le volume of the parallelepiped spanned by the vectors 1 = 1 d = d, with a suitable signature attached. L R d F f ; : : : ; f Plainly, is one-to-one and onto if and only if dim = , which is to say that the family = [ 1 d] is a “frame”, d i.e. that it spans R , and this in turn is the same as to say that the associated (signed) volume V = V (F) does not vanish. I discuss this signed volume one dimension at a time. d 1: e F f 6 V F f |f | = 1 = +1 is the standard frame; any other frame is just a number 1 = 0, and ( ) = 1 is the length 1 f f f with the signature of 1 attached. The frame 1 is “proper” or “improper” according as 1 points in the same direction as e f 1 = +1 or not. Obviously, you cannot deform a proper into an improper frame without passing through 1 = 0 which is not a frame at all, i.e. the proper and improper frames are disconnected. This is all simple stuﬀ made complicated. The next step conveys the idea more plainly. d = 2: Here is the standard frame.

F f ; f f e A second frame = [ 1 2] can be lined up with it by a (counter-clockwise) rotation aligning 1 with 1 as in the next ﬁgure.

f The resulting frame is “proper” or “improper” according as the rotated 2 lies in the upper or the lower half plane. It e f cannot lie on the line directed by 1 as that would make it depend upon 1; in particular, the proper and improper frames are disconnected, one from the other, as in d = 1. The signed volume V is now declared to be the area of the parallelogram spanned by F, taken with the signature plus or minus according as the frame is proper or not, i.e.

V |f | |f | × θ; = 1 2 sin

208 H.P. McKean

so that the plus sign prevails for 0 < θ < π and the minus sign for π < θ < 2π. Let’s spell it out: The absolute volume is

|V | |f | × f f |f | × |f − e f ; e | = 1 the coprojection of 2 on 1 = 1 2 1( 2 1) p |f | × |f |2 − |f |−2 f ; f 2 e f |f | = 1 2 1 ( 2 1) 1 being 1 divided by its length 1 p q |f |2 |f |2 − f ; f 2 f2 f2 f2 f2 − f f f f 2 = 1 2 ( 2 1) = ( 11 + 12)( 21 + 22) ( 21 11 + 22 12) p f f − f f 2 ± f f − f f : = ( 11 22 12 21) = ( 11 22 12 21) d = 3: Here is the standard frame again, obeying the right-hand screw rule.

F f ; f ; f The absolute volume for the frame = [ 1 2 3] is

|V | |f | × f f × f f f ; = 1 coprojection of 2 on 1 coprojection of 3 on the plane of 1 & 2 but what signature should be attributed to V itself? To decide, consider the following four steps: 1) rotate F so as to f e e f align 1 with 1 and adjust its length so they match; 2) rotate about 1 to bring 2 to the “upper” half of the plane of e e e f e e 1 and 2 and deform it to 2 plain; 3) 3 now lies either above or below the plane of 1 & 2 and can be deformed ±e F e ; e ; ±e this to 3: in short, can be deformed (via honest frames) to one of the two frames [ 1 2 3], and it is signature, e V 3 attached to 3, that is ascribed to . In a fancier language, the frames of R fall into two connected deformation classes, “proper” and “improper”, distinguished by the above signature, much as for d ≤ 2, which is to say that R3 is capable of two different “orientations” exemplified by the right- and left-hand screw rules. n d ≥ 4: It is much the same in higher dimensions. A frame determines an increasing family of subspaces R = d span {fk : 1 ≤ k ≤ n}, ending with R itself; the absolute volume is the product, from n = 1 up to d, of the length n− of the coprojection of fn on R 1; and the signature of V is determined by the deformation class, of which there are two, e ; e ;:::; ±e this e as before: Any frame may be deformed to [ 1 2 d], and it is signature, attached to d, that is ascribed to V . The definition of the signed volume is now complete, but what has it to do with the determinant? I postpone the punchline in favor of two simple observations.

Observation 1.

V is “skew” in a sense I will now explain. Let π ∈ Sd be a permutation of the letters 1; : : : ; d. Then V (πF) = V f ; f ;::: V F V πF π(1) π(2) represents the same absolute volume as ( ): it is only the signature that changes, as in ( ) = χ π V F χ π ± χ S χ π π χ π χ π ( ) ( ) with ( ) = 1. Plainly, this is a character of d meaning that ( 1 2) = ( 1) ( 2), and it is a fact that Sd has only one such character besides the trivial character χ ≡ 1: Obviously, every permutation is a product of “transpositions”, exchanging two letters, ﬁxing the rest. Now you will check that if χ ascribes the value +1 to any transposition, then it is the same for every transposition, in which case χ is trivial, i.e. χ ≡ 1. Otherwise, it ascribes the

209 Fredholm determinants

value −1 to every transposition and χ(π) = (−1)#, # being the number of transpositions in π. This is nothing but the signature of π appearing in the formula for the determinant. The meaning of “V skew” is now this: V (πF) = χ(π)×V (F).

Aside. Here you have a somewhat indirect proof that χ(π) = (−1)#, alias the signature of π, is well-deﬁned, i.e. that the parity of # is always the same, however π may be decomposed into transpositions. Another way is to examine how π Q x − x d x < x < ··· < x changes the signature of the “discriminant” i>j ( i j ) of distinct real numbers 1 2 d. Try it.

Observation 2.

V F f ; : : : ; f V is now extended to any family of vectors = [ 1 d], independent or not, by putting = 0 in the second case, V f ; : : : ; f V when the parallelepiped collapses. Then (most important) is linear in each of its arguments 1 d. Because f d V f f − f f is skew, you have to check this only for d. That is obvious for = 2 from the explicit formula = 11 22 12 21 and extends easily to d ≥ 3: in computing V , the part of fd in the span of fn, 1 ≤ n < d, changes nothing; it is only the f g V F V f ; : : : ; f ; g × f • g projection of d upon the normal to that span which counts, as in ( ) = [ 1 d−1 ] ( d ). This makes the linearity obvious.

1.4. Why V is the determinant

It remains to compute V from the two properties, 1) skewness and 2) the linearity elicited in Section 1.3, plus the d V e ; : : : ; e f P f e ≤ i ≤ d normalization 3) [ 1 d] = 1. Write i = ij j for 1 . Then j=1

d d X X V ::: f j : : : fdj V e j ; : : : ; e j : = 1 1 d [ 1 1 1 d ] by 2) j j 1=1 d=1

j ; : : : ; j V Here, it is only summands with distinct 1 d that survive, being skew by 1), so you may write

d d X Y X Y V V e ; : : : ; e f π f = [ π(1) π(d)] iπ(i) = sign iπ(i) by 1) and 3) π∈Sd i=1 π∈Sd i=1 fij ; = det 1≤i;j≤d

f Le etc. M M† and if now 1 = 1, , so that the preceding matrix is the old transposed, then it is seen that det is nothing but † the signed volume of the image of the unit cube C under L. Actually, det M = det M, as in rule 1 of the next section, so with an imprecise notation, det M = V (LC).

Aside. Besides its interpretation as a signed volume, det M has also a spectral meaning: it is the value of the characteristic polynomial det (M − λI) at λ = 0; as such, it is the product of the (generally complex) eigenvalues of M. These are either real or else they appear in conjugate pairs, M being real, in accord with the reality of the determinant.

1.5. Simplest rules

Here, A; B; C, etc. are d × d matrices.

Rule 1.

† det C = det C: indeed, d d d X Y X Y X − Y χ π c χ π c − χ π 1 c ( ) iπ(i) = ( ) jπ 1(j) = jπ(j) i=1 j=1 j=1

− − since π 1 runs once through the symmetric group as π does so, and χ(π) = χ(π 1).

210 H.P. McKean

Rule 2. det (AB) = det A × det B = det (BA). This is also pretty clear. The absolute value | det C| represents the factor by d which the volume of any geometrical ﬁgure in R is changed by application of the associated linear map, as you will see by dividing space up into little cubes, so the formula is correct, up to a signature to the right in case none of the e ; : : : ; e determinants vanish. But then each of the associated frames can be deformed without change of signature to [ 1 d] e ;:::; −e A B or to [ 1 d], and you can check the rest by hand: for example, if is proper (orientation-preserving) and is improper (orientation-reversing), then AB is improper and the signs work out as follows (−1) = (+1) × (−1).

Rule 3.

− − det C 1 = (det C) 1 if det C =6 0, as is immediate from Rule 2.

Rule 4.

d det C does not depend on the basis of R . This is important: It means that the determinant is an attribute of the d d associated geometrical transformation R → R and not an artifact of its presentation by the matrix C. The point is − − that under change of basis C is replaced by D 1CD with det D 6= 0 and det (D 1CD) = det C by Rules 2 and 3.

Rule 5. ln det C = sp ln C if C is near the identity. Here sp is the trace. Now it is better to write C = I + K with small K so that ln C = ln (I + K) = K − K 2/2 + K 3/3 − ::: , by deﬁnition. The identity is trivial at K = 0 (C = I). Let ∆K be a tiny variation of K. Then, with a self-evident notation,

∆ ln det (I + K) = ln det (I + K + ∆K)/ det (I + K) − = ln det I + (I + K) 1∆K by Rules 2 and 3 − ' ln 1 + sp (I + K) 1∆K why ? − ' sp (I + K) 1∆K = sp I − K + K 2 − K 3 + ::: ∆K 1 1 = sp ∆K − (K∆K + ∆KK) + K 2∆K + K∆KK + ∆KK 2 − ::: 2 3 = sp [∆ ln (I + K)] = ∆ sp ln (I + K):

The identity sp (AB) = sp (BA) is used in line 5. I leave the rest to you.

1.6. Cramer’s rule

The determinant of M being linear in each of its columns,

d ( ) d ∂ M X Y π X 0 Y det χ π m × 1 if (1) = 1 χ π m 0 ∂m = ( ) kπ(k) π 6 = ( ) kπ (k) 11 π∈S k 0 if (1) = 1 π0∈S k d =2 d−1 =2

0 where π permutes 2; : : : ; d leaving 1 ﬁxed. This is nothing but the 11 cofactor of M. To evaluate ∂ det M/∂mij , put 0 0 i j M m mij the th row/ th column in ﬁrst place, forming a new matrix with 11 = . This can be done by permutations of i j m m rows and columns requiring + transpositions, counting modulo 2: for example, 23 is brought to the place of 11 by transposing a) columns 3 and 2, b) columns 1 and 2, and c) rows 1 and 2, requiring 5 = (2 + 3) transpositions. It follows 0 i j that det M = (−1) + det M, so the rule is

∂ det M i j = (−1) + × [the ij cofactor of M]: ∂mij

211 Fredholm determinants

Now look at d X ∂ det M × mkj ; ∂mki k=1 in which you see ∂ det M/∂m transposed. This represents the signed volume associated to M with its ith column replaced by its jth column; as such, it is just det M if i = j, as you will see by Observation 2 of Section 1.3, but the parallelepiped collapses if i =6 j and you get 0, i.e., with an abbreviated notation,

∂ M † det M M × : ∂m = det [the identity]

− − † That is Cramer’s rule, expressing M 1 as (det M) 1 × (∂ det M/∂m) .

1.7. Fredholm’s idea

To extend this machinery from d = 1; 2; 3, etc. to d = ∞ dimensions, it is clear that M cannot be too far from the identity, i.e. you must have M = I + K with K “small”. For example, the ∞ × ∞ determinant   1 + 1 0 0 0 :::  / ::: ∞  0 1 + 1 2 0 0    Y 1 det  0 0 1 + 1/3 0 ::: = 1 + = +∞   n  0 0 0 1 + 1/4 ::: n=1 :::::::::::::::

makes no sense, but   1 1/2 0 0 0 :::  / / ::: 1 2 1 1 4 0 0    det  0 1/4 1 1/8 0 :::    0 0 1/8 1 1/16 ::: ::::::::::::::::::

should be OK: after all the determinant as volume cannot be more than the product of the lengths of the individual columns, so the square of the determinant should be less than ∞ ! 1 1 1 1 1 5 5 5 5 X −n 5 5 1 + 1 + + 1 + + · ::: = 1 + 1 + · : : : < exp 5 4 = exp : 4 4 16 16 64 4 16 64 4 n=2 4 12

The task is now to re-express det (I + K) in such a way that the smallness of K can be exploited when d ↑ ∞. Following Fredholm, this is done by introducing a parameter λ, for book-keeping, as in ∆(λ) = det (I + λK), expanding in powers of λ: λ λ λ2 ··· λd ; ∆( ) = ∆0 + ∆1 + ∆2 + + ∆d λ K and putting back equal 1. Here, ∆0 = ∆(0) = 1, ∆d = det as you will check, and the rest is not hard to come by: ≤ p ≤ d ≤ i < i < : : : < i ≤ d i i ; : : : ; i |i| p e I For 1 and 1 1 2 p , write = ( 1 p), and = . Then, with [ ij ] = , the identity matrix,

d d d X Y X X X Y X X I λK χ π e λK λp χ π K λp K ; det ( + ) = ( ) iπ(i) + iπ(i) = ( ) iπ(i) = det [ ij ]i;j∈i π i=1 p=1 |i|=p π ﬁxing i i∈i p=1 |i|=p

i.e.   " # Kii Kij Kik X X Kii Kij X 2 3   d det (I + λK) = 1 + λ Kii + λ det + λ det Kji Kjj Kjk + ::: + λ det K Kji Kjj   1≤i≤d 1≤i

212 H.P. McKean

Cramer’s rule is to be recast in this format with a view to inverting I + λK if ∆(λ) =6 0. Taking λ into account, the recipe is ∂ † I λK −1 −1 × 1 × ∆ ; ( + ) = ∆ λ ∂K but I want to spell it out. i un p ≤ i ; : : : ; i ≤ d K K Let be any ordered collection of distinct indices 1 1 p . [ ij ]1≤i;j≤p is abbreviated as ij. Then

X 1 K ∆p = p det ij ! |i|=p i=j with a self-evident notation. Here, the extension of the sum to unordered indices merely produces p! copies of each /p ∂ /∂K sub-determinant in ∆p, whence the factor 1 ! . Now look at ∆p+1 ij on diagonal: with further self-evident notations, ∂ h i ∆p+1 1 = × the sum over i = j 3 i with |i| = p + 1 of the ii cofactor of Kij ∂Kii (p + 1)! 1 X = × the p + 1 ways of placing i in i × det Ki0j0 p 0 0 ( + 1)! i =j 63i 0 |i |=p 1 0 0 = × [the unrestricted sum over i − the restricted sum with i 3 i] p! " # " # 1 0 X Kii Kij 1 X Kii Kij = ∆p − × the p ways of placing i in i × det = ∆p − det ; p Kii Kij p − Kii Kij ! i=j63i ( 1)! i=j3i |i|=p−1 |i|=p−1 from which the restriction i 63 i can be removed since the determinant vanishes if i 3 i. The oﬀ-diagonal is much the same: with i < j for deﬁniteness,

∂ † ∆p+1 1 i j = × the sum over i 3 both i and j of (−1) + × the ji cofactor of Kij ∂Kij (p + 1)! X 1 − i+j × = p ( 1) [the determinant of the matrix displayed below] ( + 1)! i=j3i;j

in which you keep ij(•) and remove ji(◦) together with its attendant row and column. Let’s move ij to the upper left-hand corner, keeping the indices k 6= i or j in their original order. This takes i − 1 transpositions for i and j − 2 for j, the intervening ith column having been crossed out, for a total of i − 1 + j − 2 ≡ i + j + 1 (mod 2) changes of sign. Taking i j the factor (−1) + into account leaves just one change of sign, net, and if now you also reﬂect that there are (p + 1) × p ways of placing i and j in i, you will obtain " # ∂ † ∆p+1 1 X Kij Kij = − det ; ∂Kij p − Kij Kij ( 1)! i=j63i;j |i|=p−1

213 Fredholm determinants

just as on diagonal, only now the term ∆p is missing. I leave it to you to bring all this into the form   " # ∂ † d−1 1 ∆ X p X Kij Kij  = ∆ − the matrix λ det : λ ∂K  Kij Kij  p i=j =0 |i| p = 1≤i;j≤d

It is instructive to see what these objects are when expressed more compactly. You know that

∂ † I λK λ−1 ∆ ( + ) ∂K = ∆

always, by Cramer’s rule, and also that ∞ − X p (I + λK) 1 = (−λK) p=0 if λ is small, so

∞ † † ∞ ∞ ∞ X ∂ p ∂ X X X X λp ∆ +1 1 ∆ × I λK −1 λp × −λK q λn −K q; ∂K = λ ∂K = ∆ ( + ) = ∆p ( ) = ∆p( ) p=0 p=0 q=0 n=0 p+q=n

i.e. † p ∂ p X ∆ +1 −K q: ∂K = ∆p−q( ) q=0 It is this kind of book-keeping that is the key to Fredholm’s success for d = ∞, as you will see in the next chapter.

2. Fredholm in function space

2.1. Where to start

I take the simplest case presenting the least technical diﬃculties; variants will readily suggest themselves and will be d used in Chapter 3 without further comment. I propose to replace R by the ∞-dimensional space C[0; 1] and K by the integral operator Z1 K : f 7→ K(x; y) f(y) dy 0 with kernel K(x; y), 0 ≤ x; y ≤ 1, of class C[0; 1]2. Here and below, I do not distinguish between the operator and the kernel by means of which it is expressed: K means what it has to mean, by the context.

2.2. Aside on compactness

It is always some kind of compactness that makes things manageable for d = ∞; in eﬀect, it reduces d = ∞ to ﬁnite d. Two kinds of compactness are needed here. The proofs are relegated to an appendix.

The space C[0; 1].

This is the space of (real) continuous functions f on the unit interval, provided with the norm kfk∞ = {|f x | ≤ x ≤ } F ⊂ C d f ; f kf − f k max ( ) : 0 1 . A ﬁgure is compact relative to the natural distance ( 1 2) = 1 2 ∞ if it is 1) closed, 2) bounded, and 3) |f(b) − f(a)| ≤ ω(|b − a|) for every f ∈ F, with one and the same, positive, increasing function ω(h), h > 0, subject to ω(0+) = 0. Below, fn → f∞ means this type of convergence.

214 H.P. McKean

2 The space L [0; 1]. R 1 f2 dx < ∞ kfk This is the space of (real) Lebesgue-measurable functions subject to 0 , with the Pythagorean norm 2 = qR R 1 f2 dx f ; f 1 f f dx B {kfk ≤ } 0 and the associated inner product ( 1 2) = 0 1 2 . Here, the unit ball = 2 1 is not compact d f ; f kf − f k B e x nπx relative to the distance ( 1 2) = 1 2 2: in fact, contains the mutually perpendicular functions n( ) = sin , n ≥ 1, and d(ei; ej ) = 1 for any i 6= j. Fortunately, B is weakly compact and this will be enough. It means that if e ∈ B n ≥ n < n < n < e ∈ B e ; f n for 1, then you can pick indices 1 2 3 etc. and a function ∞ so as to make ( n ) tend to e ; f n n ; n ;::: ↑ ∞ f ∈ L2 e e this ( ∞ ) as = 1 2 , for every simultaneously. Below, n ∞ means type of convergence. Take it as an exercise to check that sin nπx 0.

Example.

Obviously, K maps L2 into C. Let’s prove that F = KB is compact in C by verifying 1), 2), 3) in reverse order: 3), 2), 1). 3) For any function f = Ke in KB,

1 s Z Z 1 |f b − f a | ≤ |K b; y − K a; y | |e y | dy ≤ kek |K b; y − K a; y |2 dy ≤ ω |b − a| ( ) ( ) ( ) ( ) ( ) 2 ( ) ( ) ( ) 0 0 with ω(h) = max |K(b; y) − K(a; y)| : 0 ≤ a; b ≤ 1; |b − a| ≤ h; 0 ≤ y ≤ 1 :

3) follows from the fact that ω(0+) = 0. Why so? 2) More simply, s Z 1 |f(x)| ≤ |K(x; y)|2 dy ≤ max |K(x; y)| : 0 ≤ x; y ≤ 1 ≡ m < ∞; 0 so KB is bounded. 1) KB is closed: in fact, if KB 3 fn = Ken → f∞ and if you assume, as you may, that en e∞, then you will have f∞ = Ke∞, as required.

2.3. Some geometry

† † † Now to business, recapitulating Section 1.1. Let R/R be the range and N/N the null space of I+K/I+K , respectively, in the space C[0; 1].

Item 1.

† N, and likewise N , is ﬁnite-dimensional; in particular, the associated (L2) projections map L2 into C.

Proof. Pick ek , 1 ≤ k ≤ n, from N, each of unit length:

Z1 kek k2 e2 2 = k = 1 0 and mutually perpendicular: Z1 (ei; ej ) = eiej = 0; i 6= j: 0

215 Fredholm determinants

Then Kek = −ek , so

n n Z1 X X 2 |ek x |2 K x; · ek ≤ K x; y 2 dy; ( ) = the projection of ( ) upon 2 ( ( )) k k =1 =1 0

this being an instance of Bessel’s inequality, expressive of the Pythagorean rule. Now integrate to obtain

n 1 1 1 X Z Z Z n = |ek |2 dx ≤ |K(x; y)|2 dxdy: k =1 0 0 0

Item 2.

† o † o † N = R and N = (R ) as in the picture, just as for d < ∞; in particular, (I + K): R → R is one-to-one and onto.

Item 3.

† † R, and so also R , is closed as it stands; in particular, C is the perpendicular sum of closed subspaces R ⊕ N or † R ⊕ N .

R 3 f I K e → f ke k ↑ ∞ Proof. Let n = ( + ) n ∞. If n 2 , then

0 fn 0 0 en fn = = (I + K) en with en = ; kenk2 kenk2

0 0 0 0 0 0 0 and you may suppose that en e∞. Now fn → 0, so en = fn − Ken → −Ke∞ on the one hand and, on the other 0 0 hand, to e∞ itself, the upshot being that e∞, which cannot vanish as it is of unit length, belongs to N. But you may e ∈ No e0 ∈ No ke k < ∞ also suppose, from the start, that n , whence ∞ too, and this is contradictory. If now lim inf n 2 you may suppose en e∞. Then Ken → Ke∞ and fn → f∞ force en → e∞ and so also f∞ = (I + K) e∞, as wanted.

† This was all plain sailing with the help of compactness. What is not clear is whether dim N = dim N is still true. Now † † you can’t just count dimensions as for d < ∞ when you could see at once that dim R = dim R . Both dim R and dim R are inﬁnite and it is not obvious what to do.

Item 4.

† † n = dim N and n = dim N are one and the same.

† † † Proof. For conversation’s sake let n be larger than n, let ek , 1 ≤ k ≤ n, and ek , 1 ≤ k ≤ n , be unit mutually † perpendicular functions spanning N and N respectively, and replace K by

n 0 X † † K = K + ek ⊗ ek ≡ ek (x)en(y) : k=1

216 H.P. McKean

0 0 † Then I + K has no null space: It maps C one-to-one onto R = R ⊕ span en : 1 ≤ k ≤ n , permitting a reduction to † † † † † the simpliﬁed case: n = 0 < n . Pick e from N and discretize (I + K) e = 0 so:

d i X j i j i e† K ; 1 e† ≡ f ≤ i ≤ d d + d d d d d for 1 j=1 with d 2 X † k e 1 d d = 1 k=1 as you may do and i f o d ↑ ∞: maxi d = (1) as

This can be rewritten: d i X j i i j j e† K ; − f e† 1 e† : d + d d d d d d = 0 j=1

Now for d < ∞, it is guaranteed that the transposed problem also has a solution, e:

d i X i j i j j e K ; − e† f 1 e d + d d d d d d = 0 j=1 with d X k 2 e 1 d d = 1 k=1 if you want, and the idea is to produce a (contradictory) null function e ∈ N by making d ↑ ∞ here. This is easy:

d 2 d X j j X j 2 f e 1 ≤ f 1 : d d d d d j=1 j=1 is negligible, so d i X i j j e K ; e 1 o d + d d d d = (1) j=1 in which the error o(1) is small independently of 1 ≤ i ≤ d. Let fd(x) be the function with values e(i/d) for (i − 1)/d ≤ x < i/d f f kf k , and suppose, as you may, that d ∞, d 2 being unity by display no. 3 above. Then

Z1 fd(x) + K(x; y) fd(y) dy = 0 0 up to a like error, small independently of 0 ≤ x ≤ 1, and if you now make d ↑ ∞, you will ﬁnd 1) Kfd → Kf∞, 2) fd → f∞ kf k I K f f ∈ C n > with ∞ 2 = 1, and 3) ( + ) ∞ = 0, with the implication ∞ , i.e. 0 which is contradictory.

217 Fredholm determinants

2.4. Fredholm’s determinant

To make a determinant for I +K in the ∞-dimensional space C[0; 1], you do the obvious thing: Replace the function f(x), ≤ x ≤ d f k/d ≤ k ≤ d K d × d K i/d; j/d /d 0 1, by the -dimensional vector ( ), 1 , and the operator by the matrix ( ) 1≤i;j≤d so that d 1 X i j j Z K ; 1 f K x; y f y dy; d d d d approximates ( ) ( ) j =1 0 I K i/d; j/d /d write out Fredholm’s series for the determinant of + ( ) 1≤i;j≤d as in

d X 1 X i j 1 1 + det K ; ; p d d d i;j∈k p=1 ! |k|=p

and make d ↑ ∞ to produce

∞ Z1 Z1 X 1 p I K ::: K xi; xj d x: det ( + ) = 1 + p det ( ) 1≤i;j≤p p ! =1 0 0

The convergence is obvious term-wise, only you must have an estimate to be sure the tail of the sum is small. This is easy: a ﬁnite-dimensional determinant is the signed volume of the parallelepiped spanned by the rows/columns of the matrix in hand so with |k| = p,

v u 2 p i j Y uX i j p/ m K ; 1 ≤ t K ; × d−2 ≤ p 2 ; det d d d d d d i;j∈k i∈k j∈k

in which m = max |K(x; y)| : 0 ≤ x; y ≤ 1 and

1 X i j 1 det K ; p d d d i;j∈k ! |k|=p

is dominated by

p p/2 p 1 p/ m p m d d − ::: d − p p 2 . √ p ( 1) ( + 1) d p 1 −p by Stirling’s approximation ! p + 2 e 2π p −p/ < (me) p 2

independently of d. This is plenty. Without further ado, the main rules (1,2,3) of Section 1.5 carry over: † Rule 1. det (I + K) = det I + K . Rule 2. det [(I + K)(I + G)] = det (I + K) × det (I + G). Rule 3 applies when det (I + K) does not vanish. Then, as will be seen in Section 2.5, I + K has a nice inverse I + G − − of the same type as itself, and Rule 2 implies det (I + K) 1 = [det (I + K)] 1, as it should be. Rule 4 is left aside; it’s analogue is not needed. Rule 5. ln det (I + K) = sp ln (I + K) if everything makes sense. This is proved as in Section 2.4 for small K. Take it as G R 1 G x; x dx an exercise. Here and below sp means 0 ( ) .

218 H.P. McKean

0 0 − 0 Rule 6. If K depends upon a parameter h and K = dK/dh is nice, then [ln det (I + K)] = sp (I + K) 1K . I leave it to you to see what is needed and make a proof. The function ∆(λ) = det (I + λK) of λ ∈ C is central to the rest of the discussion: it is an integral function of order ≤ 2 in view of the estimate ∞ X × r p | λ | ≤ (constant ) ≤ A Br2 ∆( ) 1 + p/ exp ( ) p=1 ( 2)! with r = |λ| and suitable constants A and B. Take this also as an exercise. The estimate can be much improved if K x; y C 1 K x ; x ( ) is smooth: for example, if it is of class , then it pays, in the estimation of det [ ( i j )]1≤i;j≤p, to subtract, from each row of index i ≥ 2, the row before. This does not change the determinant and allows a better estimate:

p/ K x ; x ≤ × p 2 det [ ( i j )]1≤i;j≤p [ an unimportant geometrical factor ] very roughly × x − x x − x ::: x − x ( 2 1)( 3 2) ( p p−n)

x < x < : : : < x why? for 1 2 p. Here, line 2 is largest when the points are equally spaced ( ), leading to the estimate

Z Z p 1 p p p/ 1 1 1 K xi; xj d x ≤ | K xi; xj | d x ≤ p 2 × × ' ; p det [ ( )] det ( ) p p p3p/2 roughly ! x

/ whence |∆(λ)| ≤ A exp Br2 3 , i.e. ∆ is of order 2/3 as the phrase is. Note in this connection, the implication of Hadamard’s theorem (Ahlfors [1, p. 208]): that such a function of order < 1 with ∆(0) = 1 is the product Π(1 − λ/λn) over λ λ etc. its roots 1, 2, counted according to multiplicity.

Example.

K x; y x − y x ≤ y y − x x > y K x ; x Here ( ) = (1 ) if and (1 ) if . Let’s compute ∆ explicitly: det [ ( i j )]1≤i;j≤p is a polynomial x ; x ; : : : ; x p x < x < x in 1 2 p of degree 2 in each variable: for example with = 3 and 1 2 3,     x − x x − x x − x − x − x − x 1(1 1) 1(1 2) 1(1 3) 1 1 1 2 1 3 x − x x − x x − x  x − x × x − x x − x x − x  det  1(1 2) 2(1 2) 2(1 3) = 1(1 3) det  1(1 2) 2(1 2) 2(1 3) x − x x − x x − x x x x 1(1 3) 2(1 3) 3(1 3) 1 2 3     1 1 1 1 0 0 x − x × x x x  x x − x  x x − x x − x − x : = 1(1 3) det  1 2 2 = det  1 2 1 0  = 1( 2 1)( 3 2)(1 3) x1 x2 x3 x1 x2 − x1 x3 − x2

K x ; x x x x x x x Now in general, det [ ( i j )]1≤i;j≤p vanishes: in respect to 1 at 0 and 2; in respect to 2 at 1 and 3; in respect to 3 x x at 2 and 4; etc., so what could it be but a constant multiple of

x x − x x − x ::: x − x − x : 1( 2 1)( 3 2) ( p p−1)(1 p)

This constant is unity, as you will check, so

x x Z Z1 Z 3 Z 2 1 K x ; x dpx − x dx ::: x − x dx x − x x dx 1 : det [ ( i j )]1≤i;j≤p = (1 p) p ( 3 2) 2 ( 2 1) 1 1 = p! (2p + 1)! 0 0 0

In short, ∞ ∞ √ √ X λp X λ 2p+1 λ λ √1 ( ) sh√ ; ∆( ) = p = λ p = λ p=0 (2 + 1)! p=0 (2 + 1)! which is of order 1/2.

219 Fredholm determinants

2.5. Fredholm’s formulas

These implement Cramer’s rule in dimension d = ∞. They follow at once from Section 1.7 in the style of Section 2.4, K x ; x K i; j p as I ask you to check. [ ( i j )]1≤i;j≤p is abbreviated as ( ); the value of will be obvious from the context. Now

∞ 1 1 X Z Z λ λp 1 ::: K i; j dpx; ∆( ) = 1 + ∆p with ∆p = p det ( ) p ! =1 0 0

and 1 1 " # ∂ † Z Z ∆p+1 1 K(x; y) K(x; j) p− = ∆p − ::: det d 1x ∂K(x; y) (p − 1)! K(i; y) K(i; j) 0 0 with a self-evident notation, a factor of 1/d having been dropped from both sides in comparison to Section 1.7: symbolically, † p ∂ p X ∆ +1 −K q: ∂K = ∆p−q( ) q=0 Also, † ∞ 1 1 " # ∂ X λp Z Z K x; y K x; j 1 ∆ − λ ::: ( ) ( ) dpx; λ ∂K = ∆ p det K i; y K i; j p ! ( ) ( ) =0 0 0 and ∂ † I λK 1 ∆ λ × : ( + ) λ ∂K = ∆( ) [the identity]

I leave it to you to check this line by line from Section 1.7. † The last display shows that I + λK is invertible if ∆(λ) =6 0: Then (I + λK)(I + G/∆) = I with ∆ + G = (∂∆/∂K) /λ, and † the same applies to I + λK so that

† † G † G I + (I + λK) = I + λK I + = I; ∆ ∆

as it should be. But what if ∆(λ) = 0? Then I + λK kills G(·; y) for every 0 ≤ y ≤ 1 and it is desired to conclude that 1 + λK is not invertible, i.e. N =6 0. This is an automatic if G(x; y) does not vanish identically, and even if it did, there is an easy way out: For large d, i j I λ d K ; 1 det + ( ) d d d 1≤i;j≤d vanishes for some λ(d) close to λ, by Hurwitz’s theorem (Ahlfors [1, p. 178]), so there must be an associated null vector P e(k/d), 1 ≤ k ≤ d, which may be adjusted so that |e(k/d)|2/d = 1. Now, upon making d ↑ ∞, the sort of compactness employed in Section 2.3, item 4 produces, from

d i X i j j e λ d K ; 1 e ≡ ; d + ( ) d d d d 0 j=1

R e x ≤ x ≤ I λK 1 e2 a genuine null function ( ), 0 1, of + with 0 = 1, and all is well. Here Fredholm’s strategy comes into focus: If λ is small, the series for ∆ + G reduces to ∆×[the standard Neumann − series for (I + λK) 1], as in

∞ p ∞ ∞ X p X q X p X q − ∆ + G = λ ∆p−q(−K) = ∆pλ (−λK) = ∆(I + λK) 1: p=0 q=0 p=0 q=0

220 H.P. McKean

P q But, at best, the sum (−λK) represents the inverse only out to the ﬁrst root of ∆(λ) = 0 where it fails to exist. Contrariwise, Fredholm’s recipe, expressing the inverse as a “rational” function (∆ + G)/∆, works in the large, provided Pp q Gp p−q −K G only that you avoid roots of ∆. The trick is that the coeﬃcients = q=0 ∆ ( ) of the series for ∆ + may be − expressed by determinants of size [(p/2)!] 1, more or less, providing a technical control not at all apparent at the start. That is the key to Fredholm’s success.

d Aside. The notation ∂∆/∂K is not fanciful. Recall that for smooth functions F : R → R, you have

F(x + hy) = F(x) + h × the inner product ∂F/∂x • y] + o(h); which you may take as the deﬁnition of grad F = ∂F/∂x. It is the same in dimension d = ∞: A reasonable function F : C[0; 1] → R ought to satisfy

F(f + hg) = F(f) + h × the inner product of something with g + o(h); and you declare the “something” to be grad F = ∂F/∂f(x), as in

1 Z ∂F F f hg F f h g dx o h : ( + ) = ( ) + ∂f + ( ) 0

The idea extends in an obvious way to ∆ which is a function on the space C[0; 1]2 where the kernel K(x; y) lives. You may like to check that with this understanding, the formula for ∂∆/∂K is quite correct.

Some examples.

R x 1. K x Kf x ∗ f ≡ x − y f y dy K x; y x − y + x ≤ y is convolution by , i.e. = 0 ( ) ( ) with kernel ( ) = ( ) vanishing for . Then P p Pp q P q ≡ G λ p−q −K −λK ∆ 1 and ∆ + = q=0 ∆ ( ) reduces to the Neumann series ( ) . This must be OK√ as it stands:√ p− − in fact, from x ∗ · · · ∗ x p-fold = x2 1/(2p − 1)!, it develops that (I + λK) 1 is the convolution operator I − λ sin λx ∗, as you will readily check. 2. K f 7→ ∗ f x R x f y dy K x; y is now Volterra’s operator 1 ( ) = 0 ( ) . This is outside the present rules since ( ) jumps down from 1 to 0 at the diagonal, but never mind: just take K(x; x) = 0 and hope for the best. Then ∆ ≡ 1, and

" # Z K x; y K x; j − p ( ) ( ) dpx ( 1) x − y p x > y det K i; y K i; j = p ( ) or 0 according as or not. x

p x < x x > x > x > y For example, if = 2 and 1 2, the determinant vanishes except on the ﬁgure 2 1 where it reduces to

  1 1 1   det 1 0 0 = 1; 1 1 0 and the integral is just the volume of that ﬁgure: (x − y)2/2! . Try it for p = 3. The outcome is

∞ " # X Z K x; y K x; j λp ( ) ( ) e−λ(x−y) x > y det K i; y K i; j = or 0 according as or not, 0 x <:::

− −λx i.e. (I + λK) 1 = I − λe ∗. Check this by hand.

221 Fredholm determinants

3. It is amusing to observe that if K is the even more singular operator

Z x f y f 7→ √ ( ) dy x − y 0

− R x of Abel, then (I + λK) 1 may be found by a trick due to himself: K 2f = π f, as you will check, so from (I + λK) f = g R x R x 0 Kf λπ f Kg f − λ2π f I − λK g I λ2π λ2πx ∗ you ﬁnd + 0 = , i.e. 0 = ( ) , to which the inversion + exp ( ) may be applied to recover f. Check it out. What is ∆ now? 4. K(x; y) = K(y; x) = x (1 − y) if x ≤ y, as in the Example of Section 2.4. I leave it as an exercise to verify the formula

 x  √ Z √ √ Z1 √ − −   (I + λK) 1f = f − ∆ 1 × sh λ (1 − x) sh λy f(y) dy + sh λ x sh λ (1 − y) f(y) dy ;   0 x √ √ in which ∆ is the determinant det (I + λK) = (sh λ)/ λ computed in Section 2.4. Here, Fredholm’s integrals are formidable: even for p = 3 you must evaluate   K x; y K x; x K x; x K x; x ( ) ( 1) ( 2) ( 3) Z K x ; y x − x x − x x − x   ( 1 ) 1(1 1) 1(1 2) 1(1 3) dx dx dx det   1 2 3 K x2; y x1 − x2 x2 − x2 x2 − x3 x

x y < x < x < and the integral has to be split into 16 pieces according to the placement of and among the 5 points 0 1 2 x < p x < y 3 1! Let’s try = 1 only. Then, with ,

Z1 " # Zx x(1 − y) K(x; z) x(1 − y) x3 det dz = z(1 − x)z(1 − y) dz = (1 − x)(1 − y) K(z; y) z(1 − z) 6 3 0 0 y Z y2 y3 x2 x3 − x(1 − z)z(1 − y) dz = x(1 − y) − − + x 2 3 2 3 Z1 (1 − y)3 − x(1 − z)y(1 − z) dz = xy y 3 1 1 = x3(1 − y) + x(1 − y)3; 6 6 √ √ which is, indeed, the coeﬃcient of λ2 in sh λ x sh λ (1 − y), God be thanked!

2.6. A question of multiplicities

It is natural to ask what if any is the relation between n the dimension of the null space N of I + λK and m the multiplicity of the root of ∆(λ) = 0. So far, it is known only that n = 0 if and only if m = 0. The general rule is n ≤ m with n = 0 only if m = 0. Skip this at a ﬁrst reading if you like.

Example.

n < m K f ⊗ e f ⊗ e To illustrate the possibility of , consider = 1 1 + 2 2. Taking

 1  " # Z a b G  fiej  ; = = c d 0 1≤i;j≤2

222 H.P. McKean

the question is quickly reduced to the dimension n of the null space of I + λG vs. the multiplicity m of the root, if any, of det (1 + λG). Take a = c = d = 1 and b = 0. Then ∆ = (1 + λ)2 has a double root at −1 but

" # 0 0 I − G = −1 0

0 has only the one null vector 1 .

Proof of n ≤ m (adapted from Kato [13, p. 40]). If n = 0, there is nothing to prove, so take 1 ≤ n and let ∆ vanish at λ = 1, say, with multiplicity m ≥ 1. Obviously ∆(λ) =6 0 for nearby λ 6= 1. (I + λK) is then invertible, and you can form the operator I 1 − dλ P = − √ · (I + λK) 1 ; 2π −1 λ the integral being taken about a small circle enclosing λ = 1.

Item 1.

P is a projection, i.e. P2 = P. To conﬁrm this, express P2 as a double integral:

I I 1 dλ 1 dµ − − P2 = √ · √ · (I + λK) 1(I + µK) 1 2π −1 λ 2π −1 µ I I 1 1 1 λ µ 1 = √ · dλ √ · dµ − ; 2π −1 2π −1 λ − µ I + λK I + µK λµ where the ﬁrst/second integral is taken about the outer/inner circle seen in the picture.

− Do the inner integral ﬁrst to wash out the contribution of (I + λK) 1 (λ is outside). Then do the outer integral of what remains to produce I 1 − dµ − √ · (I + µK) 1 = P: 2π −1 µ

Item 2.

P acts as the identity on the null space N of I + K: indeed, Ke = −e for e ∈ N, so

I 1 − dλ Pe = − √ (1 − λ) 1 e = e; 2π −1 λ as advertised.

223 Fredholm determinants

Item 3. √ −1 H P = − 2π −1 (G/∆) dλ/λ inherits from G a nice kernel of class C[0; 1]2 and so is compact, meaning that the image R PB B f 1 f2 ≤ L2 ; Q d Why? of the unit null = : 0 1 is compact in [0 1]; as such, its range is of some ﬁnite dimension .( ) Now P commutes with K, so you may write K = PKP + (I − P) K (I − P) and infer that

∆(λ) = det (I + λK) = det (I + λPKP) × det I + λ(I − P)K(I − P) :

Here, the second factor cannot vanish at λ = 1 since I +(I −P)K(I −P) f = 0 implies Pf = 0 and so also (I +K) f = 0, i.e. f ∈ N, and f = Pf = 0, by Item 2.

Item 4.

m (I + K) P = 0, m being the multiplicity of the root of ∆: indeed,

I I 1 I + K dλ 1 (1 − λ)K dλ (I + K)P = − √ · − I = − √ · 2π −1 I + λK λ 2π −1 I + λK λ I I 1 I + K dλ 1 (1 − λ)2K 2 dλ (I + K)2P = − √ · (1 − λ) K − I = − √ · 2π −1 I + λK λ 2π −1 I + λK λ

and so forth, up to

I m m I m 1 (1 − λ) K dλ 1 m m G dλ (I + K) P = − √ · = − √ · (1 − λ) K : 2π −1 I + λK λ 2π −1 ∆ λ

This vanishes as the integrand has no pole anymore.

Item 5.

m Put (I + K) P = J and keep it in mind that P commutes with K so that J = 0, by Item 4. Then

− I + λPKP = (1 − λ)(I + ΛJ) with Λ = λ(1 − λ) 1

is a d × d matrix, d being the dimension of Q the range of P. It follows that the root-bearing factor det (I + λPKP) of ∆(λ) from Item 3 can be expressed as

d det (1 − λ)(I + ΛJ) = (1 − λ) det (I + ΛJ) for λ 6= 1

But det (I + ΛJ) = 1 being a polynomial in Λ without roots: in fact, a root entails a null function, and (I + ΛJ) f = 0 only m m d if f = −ΛJf = (−Λ) J f = 0. This means that ∆ = (1 − λ) × [a non-vanishing factor], so d = m, and since P = I on N, by Item 2, so Q ⊃ N and m = dim Q ≥ n, as wanted.

2.7. Symmetric K

This is a very special case, but it’s important for all sorts of applications and so worthy to be spelled out. For simplicity K Q f R 1 fKf ≥ and because it is all I need, I take non-negative in the sense that the quadratic form [ ] = 0 is always 0.

224 H.P. McKean

Existence of eigenfunctions.

For d < ∞, the eigenvalue problem Ke = λe has just enough solutions to span the range of K and it is the same here. In keeping with the tone of this chapter, I will obtain the existence of enough eigenfunctions by the simplest means. The quadratic form vanishes identically only if K ≡ 0 (Why?). Otherwise, it is capable of positive values, as I now suppose. R Q 1 f2 × |K x; y | ≤ x; y ≤ Q f kfk ≤ λ K is controlled by 0 max ( ) : 0 1 , so sup [ ]: 2 1 = is ﬁnite and positive. Replace d × d K i/d; j/d /d its e k/d ≤ k ≤ d by the matrix [ ( ) ]1≤i;j≤d. The top value of quadratic form in the ball of vectors ( ), 1 , Pd |e k/d |2/d ≤ λd λ d with k=1 ( ) 1 is its top eigenvalue , which is close to if is large, as I invite you to check; moreover, e e ∈ C ; d d < d < : : : ↑ ∞ you can make the associated eigenvector d approximate a function [0 1] by choice = 1 2 , by the now familiar type of compactness employed in Section 2.3. Then for d = ∞, you have Ke = λe, i.e. e is an R K λ 1 e2 λ λ e e eigenfunction of with eigenvalue and unit length: 0 = 1. Now write = 1, = 1, and look at the reduced K − λ e ⊗e e its operator 1 1 1 acting on the annihilator of 1. It may be that quadratic form is incapable of positive values. Q f λ e ; f 2;K λ e ⊗ e Q Then [ ] = 1( 1 ) = 1 1 1, and you stop. Otherwise, is capable of positive values on the annihilator e e e of 1, in which case you can repeat the procedure to produce a new unit eigenfunction 2, perpendicular to 1, with < λ ≤ λ K − Pd λ e ⊗e e eigenvalue 0 2 1, and so on. If the quadratic form of n n n n vanishes on the annihilator of n, Pd =1 n ≤ d d K ≡ λnen⊗en , the procedure stops at stage , 1 , and “enough” eigenfunctions are in hand. Otherwise, it never stops, producing an inﬁnite number of mutually perpendicular eigenfunctions en, n ≥ 1, with diminishing eigenvalues λ ≥ λ ≥ : : : > 1 2 0.

Mercer’s Theorem.

∞ P 2 Mercer’s Theorem states that K(x; y) = n λnen ⊗ en with uniform convergence of the sum on the square [0; 1] ; P∞ =1 en en; f f C ; f K in particular, n=1 ( ) converges to in [0 1] for every function in the range of : that is what “enough” eigenfunctions means. The case of ﬁnite rank when the procedure stops is left aside.

Pd Proof. d < ∞ K − λnen ⊗en Fix . The reduced operator n=1 is non-negative, i.e.

1 1 " d # Z Z X f(x) K(x; y) − λnen ⊗en f(y) dx dy ≥ 0; 0 0 1 from which you learn the following.

Pd − λne2 x ≤ K x; x f h 1 × |x − y| ≤ h/ h 1) n=1 n( ) ( ) by making approximate [the indicator of 2] and taking down to 0+. P∞ R λn ≤ K x; x dx < ∞ 2) n=1 ( ) , by integration. P∞ λnen ⊗en x/y y/x 3) n=1 is now seen to converge in the whole square and, indeed, uniformly in for ﬁxed in view of 1) and the tail estimate

2 X X X X 2 2 2 λnen(x)en(y) ≤ λnen(x) × λnen(y) ≤ e.g. K(x; x) λnen(y): n≥d n≥d n≥d n≥d

P P 4) λnen⊗en f = λnen(en; f) is likewise uniformly convergent by the tail estimate:

2 1 X X X Z 2 2 2 2 λnen(en; f) ≤ λnen (en; f) ≤ λdK(x; x) f dx n≥d n≥d n≥d 0

P 2 and the fact that λd ↓ 0 by 2); alternatively, 1) and the decrease of n≥d(en; f) to 0 do the trick; see Dini’s theorem in the appendix.

225 Fredholm determinants

P∞ f Kf − λnen en; f en n ≥ Q f ≤ λd|f |2 ↓ 5) Now the continuous function 0 = n=1 ( ) is perpendicular to , 1, and since [ 0] 0 2 0 d ↑ ∞ Q f Q f hf h as by 2), so [ 0] = 0. Therefore, [ 0 + ] is minimum (0) at = 0, and

1 d Z Z Q f hf f Kf f2 ; dh [ 0 + ] = 2 0 = 2 0 = 0 0

f ≡ i.e. 0 0. It follows that 1 ∞ Z X Kf(x) = λnen(x) en(y) f(y) dy; n 0 =1 P∞ f ∈ C ; ≤ x ≤ K x; y λnen x en y ≤ x; y ≤ for every [0 1] and every 0 1, whence ( ) = n=1 ( ) ( ) at all points 0 1, by 3).

P 2 6) Take y = x. Then n≤d λnen(x) increases with d to the continuous function K(x; x). Dini’s theorem now tells you P that this convergence is uniform. Therefore, the convergence of the sum λnen⊗en for K(x; y) must be uniform, too. Check it out.

Item 6) plus 5) is the content of Mercer’s theorem.

Expressing ∆ as a product. P Because K = λnen ⊗en, with uniform convergence in the square,

d d ∞ Y Y Y I λK I λ λnen ⊗en I λ λnen ⊗en λλn det ( + ) =d lim↑∞ det + ( ) =d lim↑∞ det + ( ) = (1 + ) n=1 n=1 n=1

since all the little p × p determinants entering into the series for det (I + λe ⊗ e) vanish for p ≥ 2. Here, there is no P question as to the convergence of the product in view of 2). Besides |∆(λ)| ≤ exp r λn for r = |λ|, so ∆ is of order 1, only.

Example.

2 I go back to the symmetric kernel of Section 2.4: K(x; y) = x(1 − y) if x ≤ y. This is√ positive, being the inverse of −D acting on the domain C 2[0; 1] ∩ {f : f(0) = 0 = f(1)}, with eigenfunctions en(x) = 2 sin πnx and eigenvalues 1/n2π2. But you know the determinant of 1 + λK from Section 2.4 already, and its present evaluation may be looked upon as a proof of the standard product

∞ √ ∞ λ Y λ2 λ Y λ sin − sh√ : λ = 1 n2π2 in the present form λ = 1 + n2π2 n=1 n=1

3. Fredholm applied

I collect here three applications of Fredholm determinants to: 1) Brownian motion, 2) long surface waves in shallow water, and 3) unitary matrices in high dimension. Quite a variety, no?

3.1. Brownian motion: preliminaries

The Brownian motion describes the path of, e.g. a (visible) dust note in a sun beam, impelled unpredictably, this way and that, by collisions with the (invisible) molecules of the air. Brown [2] studied it ﬁrst, in Nature; later, Einstein [9] ∂p/∂t 1 ∂2p/∂x2 related it to diﬀusion and the underlying heat equation = 2 .

226 H.P. McKean

The mathematical set-up required can be reduced to the unit interval, equipped with its standard Lebesgue measure. I suppose you know something about that. Only the language changes: a measurable set A ⊂ [0; 1] is now an “event”; the measure of A is its “probability” P(A); a measurable function x : [0; 1] → R is a “random variable”; the integral R 1 x ≡ E x 0 ( ) is its “mean-value” or “expectation”. With this lingo in place, a collection of random variables x is said to be a Gaussian family if any combination k • x = k x ::: k x k ; : : : ; k ∈ n − σ 1 1 + + n n with ( 1 n) R 0 is Gauss-distributed with mean 0 and positive root-mean-square , i.e.

b Z e−x2/2σ2 P a ≤ k • x < b √ dx a < b: ( ) = πσ for any a 2

Here, σ 2 is a quadratic form in k:

+∞ Z e−x2/2σ2 X σ 2 x2 √ dx E k • x 2 kikj E xixj ≡ k • Qk; = πσ = ( ) = ( ) −∞ 2 1≤i;j≤n

Q Q E x x being the (symmetric, positive-deﬁnite) matrix of “correlations” ij = [ ( i j )]1≤i;j≤n. Now

+∞ √ Z √ e−x2/2σ2 Ee −1k•x e −1 k•x √ dx e−σ2/2 e−k•Qk/2 = πσ = = −∞ 2

x ;:::; x is the Fourier transform of the joint distribution of 1 n. This may be inverted to obtain their joint probability x x ; : : : ; x ∈ n density as function of = ( 1 n) R :

− Z √ e−x•Q 1x/2 p x 1 e− −1 x•k e−k•Qk/2 dnk : ( ) = n = p π π n/2 Q (2 ) n (2 ) det R

You see how the determinant enters in and can imagine where all this could lead. I spell out the computation: Q can be 0 n n brought to principal axes by a rotation k → k of R . This does not change the volume element d k but simpliﬁes the Pn 0 k • Qk λi k 2 λ Q quadratic form as in = i=1 ( i ) , the ’s being the (positive) eigenvalues of . Then, with the same rotation x → x0 , √ √ Z 0 0 P 0 Z 0 P p x 1 e− −1 x •k e− λi(ki )2/2 dnk0 1 e− −1 x •k e− λi(ki)2/2 dnk: ( ) = n = n (2π) (2π) The joint density is now seen to be the product of

+∞ √ Z 0 1 e −1 xik e−λik2/2 dk π 2 −∞ taken over 1 ≤ i ≤ n, i.e. n 0 − Y e−(xi)2/2λi e−x•Q 1x/2 p x √ ( ) = = n/ p πλi π 2 Q i=1 2 (2 ) det as advertised. The formula itself is nice, and you learn from it two important things: 1) that the statistics of a Gaussian family are completely determined by its correlations, and 2) that two such families are statistically independent if and x ;:::; x x ;:::; x only if they are uncorrelated, one with the other: indeed, if 1 p and p+1 p+q are uncorrelated, then their joint correlation Q breaks up into blocks as in the ﬁgure,

227 Fredholm determinants

− Q 1 does likewise, and the joint density splits, which is what independence is all about: if A is any event concerning xi, i ≤ p, and B any event concerning the other xj , j > p, then P(A ∩ B) = P(A) × P(B). A little trick will help to convince you that the simple space [0; 1], equipped with its Lebesgue measure, really is rich x ∈ ; x : x x ::: x n ≥ enough to support such families. Let [0 1] be expanded in binary notation: = 1 2 The variables n, 1, are independent (check this!) with common distribution P(xn = 0) = P(xn = 1) = 1/2. Deﬁne

x : x x x x :::; 1 = 1 3 6 10 x : x x x :::; 2 = 2 5 9 x : x x :::; 3 = 4 8 x : x :::; 4 = 7

and so on. P x ≤ x x ≤ x ≤ You see the pattern. These new variables arex0 independent, with common distribution ( ) = , 0 1. Then x0 x R n −1/2 −x2/2 the modiﬁed variables n deﬁned by n = −∞(2π) e dx are also independent with common probability density − / −x2/ p (2π) 1 2 e 2. Now just one more step: if Q is any n × n, symmetric, positive matrix and Q its symmetric positive 00 Pn p 0 x Q ij x ≤ i ≤ n Q root, then the family i = j=1( ) j , 1 , is Gaussian with correlation . The generalities now are ended and I turn to the (standard, 1-dimensional) Brownian motion itself. This is the Gaussian x t t ≥ E x t x t t t family ( ), 0, with correlation [ ( 1) ( 2)] = the smaller of 1 and 2. P x t ≤ t x t − x t t − t It is immediate that 1) [ (0) = 0] = 1; 2) for 1 2, the increment ( 2) ( 1) has mean-square 2 1; 3) increments < t < taken over disjoint intervals of time are uncorrelated and so also statistically independent. It follows that for 0 1 t < : : : < t x t ; x t ;:::; x t 2 n, the joint probability density of the “observations” ( 1) ( 2) ( n) is

−x2/ t − x −x 2/ t −t − xn−xn− 2/ tn−tn− e 1 2 1 e ( 2 1) 2( 2 1) e ( 1) 2( 1) p x √ ::: : ( ) = πt p π t − t p π t − t 2 1 2 ( 2 1) 2 ( n n−1)

Wiener [26] proved that the Brownian motion has continuous paths. I explain what he meant, technically. The present x · machinery cannot deal directly with the event “ ( ) is continuous” since it involves an uncountable number of observationsp of the Brownian path t 7→ x(t). What is needed is something like this: with probability 1, |x(i/n) − x(j/n)| ≤ 3 i/n − j/n for all 0 ≤ j/n < i/n ≤ 1 with suﬃciently small separation max |i/n − j/n| ≤ h, independently of n ≥ 1. This permits you to complete the countable Gaussian family x(k/n), k ≤ n, n ≥ 1, to the uncountable family x(t) = limk/n↓t x(k/n), 0 ≤ t ≤ 1, with the correct joint distributions for any ﬁnite number of observations and guaranteed continuity of paths. The proof takes some work; see McKean [18] for Ciesielski’s slick version. It is natural to write the measure induced from [0; 1] onto path-space, symbolically:

∞ − R x• t 2/ dt e 0 [ ( )] 2 dP d∞x: = ∞/ (2π0+) 2

• ∞/ Here nothing is kosher: with probability 1, x (t) does not exist for any t ≥ 0; (2π0+) 2 is silly; and the ﬂat volume ∞ element d x is a bogus, too. Nevertheless, the whole reminds you of what is happening and is even a reliable guide to computation.

228 H.P. McKean

A variant of this “free” Brownian motion ﬁgures below. This is the “tied” Brownian motion, alias the free motion, considered for times t ≤ 1 and conditioned so that x(1) = 0. The conditioning does not change the Gaussian character of the family, so only the correlation needs to be found. Here is a neat way to do that. Look at the picture.

The variables x − x t ξ x t − t x ; η x t − x t − (1) ( 1) × t − t ; x = ( 1) 1 (1) = ( 2) ( 1) − t ( 2 1) and (1) 1 1 are uncorrelated, as you will check, and so independent. Now condition by x(1) = 0. The independence of ξ and η is not x t ξ x t ξ − t / − t η spoiled, whence the (conditional) correlation between ( 1) = and ( 2) = (1 2) (1 1) + is just the unconditional E ξ2 − t / − t t − t correlation ( ) (1 2) (1 1) = 1 (1 2). A nice model of the tied Brownian motion is obtained from the free Brownian motion in the form x(t) − tx(1), 0 ≤ t ≤ 1. You have only to check (in one line) that the correlations are what they should be.

3.2. Brownian motion: some applications

Simplest examples.

R 1 I want to compute the mean-value of F = exp − (λ/2) x2(t) dt for the free Brownian motion. The task is simpliﬁed 0 Pn F Fn − λ/ x2 k/n /n by the continuity of the Brownian path, permitting the reduction of to = exp ( 2) k=1 ( ) with a subsequent passage to the limit n ↑ ∞. Write the joint density for x(k/n), 1 ≤ k ≤ n, as

− exp − x • Q 1x/2 i j Qij ; ; ≤ i; j ≤ n: n/ p with = min 1 (2π) 2 det Q n n

Then

s − Z − x • λ/n I Q−1 x/ λ/n I Q−1 1 exp (( ) + ) 2 n det ( ) + −1/2 E(Fn) = p d x = = det (I + λQ/n) π n/2 Q Q n (2 ) det det R so that   Z1 λ 1 E exp − x2 dt = p 2 det (I + λK) 0 with Fredholm’s determinant and the (symmetric) operator

Z1 K : f 7→ min (x; y) f(y) dy: 0

229 Fredholm determinants

2 2 0 The determinant√ is easy to compute: K is the (symmetric) inverse to −D acting on C [0; 1] ∩ {f : f(0) = f (1) = 0}, with − eigenfunctions 2 sin π(n + 1/2)x and eigenvalues [π (n + 1/2)] 2, n ≥ 0, so

∞ Y λ √ I λK λ det ( + ) = 1 + π2 n / 2 = ch n=0 ( + 1 2)

and   Z1 √ λ −1/2 E exp − x2 dt = ch λ : 2 0

The computation for the tied Brownian motion is much the same: Now Qij = (i/n)(i − j/n) for i ≤ j; K(x; y) is the 2 2 symmetric kernel x(1 − y) if x ≤ y; the associated operator√ inverts −D acting on C [0; 1] ∩ {f : f(0) = f(1) = 0}; the − − / eigenvalues are (πn) 2, n ≥ 1; and det (I + λK) = λ 1 2 sh λ, as in Section 2.7, so now

  Z1 √ √ λ −1/2 E exp − x2 dt = λ sh λ : 2 0

P. Lévy’s area.

x t ∈ ; → 2 x x Let : [0 1] R be the plane motion of two independent copies 1 & 2 of the tied Brownian motion and consider P. Lévy’s area [17, p. 254] Z1 Z1 A 1 x dx − x dx x dx A = ( 1 2 2 1) = 1 2 = lim n 2 n↑∞ 0 0 with n− X1 k k k A x x + 1 − x : n = 1 n 2 n 2 n k=0

The Brownian path is too irregular for this to be any kind of classical integral, but the limit exists with probability√ 1 and is declared to be the area. I ask you to believe this and proceed to the evaluation of the Fourier transform E exp ( −1 λA) A x x of the distribution of . The individual Brownian motions 1 and 2 are independent, so their joint probabilities split and x x you may begin by ﬁxing 1 and integrating out 2 with the help of the general rule

− Z −x•P 1x/2 n x•y e d x y•Py/ e √ = e 2: π n/2 P n (2 ) det R

P x A y x k/n ≤ k < n Here, is to be taken as the correlation of the increments of 2 ﬁguring in n, and is the vector 1( ), 0 . P − /n2 /n x You have ij = 1 plus an extra 1 on diagonal, as you will check, so with 1 ﬁxed, you have

   n− n− !2 √ 2 1 1 2 h − λA i  λ X k X k  λ E e 1 n x t ≤ t ≤ − x2 1 − x 1 − y • Py : 1( ) : 0 1 = exp  1  = exp  1 n n n n  2 k=0 k=0 2

It is convenient to shift the summations above from 0 ≤ k < n to 0 ≤ k ≤ n; this makes no diﬀerence to the limit. Then, with the correlations

i j k Q − ; i ≤ j; x ; ≤ k ≤ n; y; ij = n 1 n for 1 n 1 alias

230 H.P. McKean

you ﬁnd − s √ Z e−x•Q 1x/2 λ2P Q−1 −1 Ee −1 λAn e−λ2x•Px/2 dnx det ( + ) 1 ; = n/ p = Q = p (2π) 2 det Q det det (I + λ2QP) 2 in which QP = Q/n − (Q1/n ) ⊗ 1.√ This matrix approximates the operator with the familiar kernel K(x; y) = x (1 − y) minus e1⊗1 with e = K1, so E exp ( −1 λA) is 1 over the square root of Fredholm’s determinant

−1 − −1 det I + λ2(K − e⊗1) = det I + λ2K × det I − I + λ2K e⊗1 = λ 1 sh λ × 1 − sp I + λ2K e⊗1 : √ P∞ K en ⊗en/n2π2 en nπx n ≥ It remains to evaluate the trace (sp) which is not hard: = n=1 with = 2 sin for 1, so

 2 Z1 ∞ Z1 −1 −1 X 1 X 1 1 2 sh λ/2 I λ2K e ⊗ I λ2K e dx  en × − : sp + 1 = 1 + = λ2 n2π2 = 8 λ2 n2π2 n2π2 = 1 λ λ/ n + n + ch 2 0 =1 0 odd ∞ λ−1 λ Q λ2/n2π2 The last expression comes from the product sh = n=1 1 + , by logarithmic diﬀerentiation and a bit of manipulation which I leave to you. Then

2 sh (λ/2) det I + λ2(K − e ⊗ 1) reduces to λ/2 for the ﬁnal evaluation: √ − λA λ/2 E e 1 = : sh (λ/2)

Inverting.

This is all very well, but what is really wanted in these examples is the inverse transform, i.e. the probability density A π/ × −2 πx of the variable in hand. P. Lévy’s area is the simplest in this regard, the√ density being ( 2) ch ( ), as is easily − R − − λx − checked. I leave it to you: Just compute the inverse transform (2π) 1 e 1 (λ/2)[sh (λ/2)] 1 dλ for x < 0 by residues in the upper half-plane. √ √ −1/2 − / −1/2 The other inversions: of 1) ch λ and 2) λ 1 2 sh λ are more diﬃcult. 1) was done by Cameron and Martin [3, pp. 195–209]. The present computation is more economical: √ √ ∞ − λ/2 √ √ 1 2e X n 1 · 3 · 5 · ::: · (2n − 1) − n / λ p √ = p √ = 2 (−1) e (2 +1 2) λ e−2 λ · · · ::: · n ch 1 + n=0 2 4 6 2 ∞ ∞ Z −(n+1/4)2/x X n 1 · 3 · 5 · ::: · (2n − 1) 1 e −λx = (−1) n + √ e dx · · · ::: · n 3 n 2 4 6 2 4 πx =0 0 where ∞ X · · · ::: · n − √ 1 1 3 5 (2 1) xn − x = · · · ::: · n 1 n=0 2 4 6 2 is used for the second equality and ∞ Z −x2/2t √ e −λt −x λ x √ e dt = e 2 2πt3 0 − / −x2/ t for the third one; the latter is easily deduced from the fact that p = (2πt) 1 2e 2 solves the heat equation ∂p/∂t = R ∂2p/ ∂x2 1 x2 t dt 2 . The probability density of 0 ( ) for the free Brownian motion may now be read oﬀ: ∞ −(n+1/4)2x X n 1 · 3 · 5 · ::: · (2n − 1) 1 e (−1) × n + √ : · · · ::: · n πx3 n=0 2 4 6 2 4 √ − / −1/2 It is related to one of Jacobi’s theta functions. Not very attractive, but there it is. The inversion of 2) λ 1 2 sh λ is worse, so I leave the matter here.

231 Fredholm determinants

3.3. Robt. Brown’s Brownian motion

I come to the subject by indirection. Let’s modify the symbolic expression of the free Brownian measure for “short” paths x(t), 0 ≤ t ≤ 1, by the introduction of a density function, as in

 1   1  m m2 Z Z em/2 × − x2 − x2 × 1 − 1 x• 2 d∞x: exp  (1)  ∞/ exp  ( )  2 2 (2π0+) 2 2 0 0

The exponent is of degree 2 in x, so the measure is still Gaussian, up to a normalizer to make the total mass m/ be 1. That is the role of the factor e 2: The total mass of the rest, alias the free Brownian mean-value of R 1 exp − mx2(1)/2 − m2/2 x2 , may be reduced in the now familiar way, to the reciprocal square root of ∆ = 0 R I m2K mKe⊗e K x; y x; y e x 1 fe ≡ f det + + with ( ) = min ( ) and the unit mass at = 1, i.e. 0 (1). The symbol e⊗e is not quite kosher but don’t worry: it is only of rank 1 and will take care of itself. The computation of this determinant follows:

−1 −1 ∆ = det I + m2K × I + I + m2K mKe⊗e = ch m × 1 + sp I + m2K mKe⊗e −1 = ch m × 1 + I + m2K mK evaluated at (1; 1) :

−1 0 Now you need to know I + m2K ≡ I + G. Here, K is inverse to −D2 acting on C 2[0; 1] ∩ {f : f(0) = 0 = f (1)}, so G is inverse to D2/m2 − I acting on the same domain, and an easy computation produces its (symmetric) kernel

−m G(x; y) = sh mx ch m(1 − y) if x ≤ y: ch m

Then − − G I m2K 1 K I m2K 1 × 1 I m2K − I − ; + = + m2 + = m2 m so ∆ = ch m × (1 − G(1; 1)/m) = e , as wanted, i.e. the measure is normalized. But what Gaussian measure is this anyhow? The correlation

  1   m m2 Z Q em/2E − x2 − x2 x t x t = exp  (1)  ( 1) ( 2) 2 2 0

will tell, or, what is just as good, the associated quadratic form    1   1 2 m m2 Z Z m/2  x2 x2 x  (f; Qf) = e E  exp − (1) −   f   ; 2 2 0 0

m/ R R Z e 2 E − mx2 / − m2 1 x2/ − λ 1 fx 2/ − ∂/∂λ λ which may be obtained from = exp (1) 2 0 2 exp 0 2 by taking 2 at = 0. The computation follows the pattern for the normalizer: Z is reduced to

m/ −1/2 e 2 det I + m2K + mKe⊗e + λKf ⊗f

from which rule 6 of Section 2.4 produces

m/ 1 −3/2 −1 (f; Qf) = −2e 2 ×− det I + m2K + mKe⊗e ×det I + m2K + mKe⊗e ×sp I + m2K + mKe⊗e Kf ⊗f : 2

232 H.P. McKean

m Now det I + m2K + mKe⊗e = e , as you already know, so line 1 is cancelled. Only line 2 remains, which is to say

−1 −1 −1 −1 Q = I + m2K + mKe⊗e K = I + I + m2K mKe⊗e I + m2K K:

−1 Here, I + m2K K = −G/m2 as before, and the simple rule

− − (I + a⊗b) 1 = I − (1 + (a; b)) 1a⊗b with a = −Ge/m and b = e produces

" −1 # Q I − − 1 G ; − 1 Ge⊗e × − 1 G = 1 m (1 1) m m2 − 1 G x; y − 1 e−m m × − 1 G x; × − 1 G y; 1 m x e−my x ≤ y; = m2 ( ) m ch m ( 1) m ( 1) = m sh for

x t y t as you will check, only now you should put = 1 and = 2. OK. But, still, what is that? Something more tangible is wanted. A symbolic calculation will guide us: Let F(x) be any nice function of the short path x(t), 0 ≤ t ≤ 1. Then

 1  Z F x m m2 Z Z Z F x Z em/2 ( ) − x2 − x2 − 1 x• 2 d∞x em/2 ( ) − 1 x• mx 2 d∞x: ∞/ exp  (1)  exp ( ) = ∞/ exp ( + ) (2π0+) 2 2 2 2 (2π0+) 2 2 0

• • Put x + mx = y , or what is the same for x(0) = y(0) = 0,

−mt • −mt x(t) = e ∗ y = y(t) − me ∗ y; where ∗ stands for the convolution. The prior display now takes the form

 1  Z F e−mt ∗ y• Z ∂x em/2 ( ) − 1 y• 2 d∞y ∞/ exp  ( )  det (2π0+) 2 2 ∂y 0 with the formal Jacobian determinant of x in respect to y:

( m t −t ∂x t − m × e ( 2 1) t < t ( 1) [the identity] if 2 1, ∂y t = ( 2) 0 otherwise:

This is ambiguous on diagonal, but if you take it to be −m/2 by way of compromise, then the Jacobian determinant is −m/ e 2, and the expectation assumes its ﬁnal form:

  1  2 Z m/ m m −mt • e 2 E F(x) exp − x2(1) − x2 = E F(e ∗ y ) 2 2 0 with the free Brownian mean-value to the right. The formula originates with Cameron and Martin [3]; it suggests that the Gaussian probabilities

 1   1  m m2 Z Z em/2 − x2 − x2 1 − 1 x• 2 d∞x exp  (1)  ∞/ exp  ( )  2 2 (2π0+) 2 2 0 0

233 Fredholm determinants

−mt • are one and the same as what the free Brownian y induces on path space by the transformation y(t) → x(t) = e ∗ y ; in fact, for these paths x,

 t  Z 1 −mt mt0 0 1 −mt E x t x t e 1 −mt e2 dt  mt e 2 t ≤ t ; [ ( 1) ( 2)] = exp 2 = m sh ( 1) for 1 2 0

as it ought to be. Take that as an exercise. Finally, I come to the “real” Brownian motion: Ornstein–Uhlenbeck [23] advocated x as a model of velocity of the Brownian motion of Robt. Brown. I change notation accordingly, from x to v. Then the position of the Brownian particle, starting from rest at 0, is Zt 0 0 x(t) = v(t ) dt 0 and you have, symbolically, Newton’s law:

d2x dv dy × −mv [mass 1] [acceleration] = dt2 = dt = + dt = [force]

in which −mv is descriptive of air resistance (drag) and dy/dt of the (thermal) agitation of the air molecules. Now intensify this agitation by a factor m, so that drag and agitation are comparable, and make m ↑ ∞. Then

d2x dx dy −m v m ; dt2 = dt + dt

and with x(0) = v(0) = 0, you have

r 1 − mt x(t) = y + [a negligeable error of root-mean-square] = (1 − e 2 ): 2m

In this way, Ornstein–Ulhenbeck reconciled their “real” Brownian motion x with that of Einstein [9] who took it to be the free Brownian motion y, plain.

3.4. Shallow water: background and preliminaries

The equation of Korteweg–de Vries (1895) is descriptive of the leading edge of long surface waves in shallow water; in convenient units, it reads: ∂v/∂t + v∂v/∂x = ∂3v/∂x3; v(t; x) being the velocity of the wave, taken to be independent of y as in the ﬁgure.

234 H.P. McKean

Here, you have a competition between 1) ∂v/∂t +v ∂v/∂x = 0 and 2) ∂v/∂t = ∂3v/∂x3. 1) exhibits “shocks”. 2) is “disper- sive”. I explain: 1) states that points at height h on the graph of v move to the right at speed h, so if f = v(0+; ·) is increasing nothing bad happens, but if it is decreasing, then high values overtake low values and pass beyond, so that v(t; ·) ceases to be a single-valued function. The inception of this turning over is a “shock”. As to 2), v(t; x) = sin k x − k2t is a solution, showing that waves of diﬀerent frequencies (k) move at diﬀerent speeds (k2). This is “dispersion”. In KdV, dispersion overcomes the shocks and the solution is perfectly nice for all times t ≥ 0 and, indeed, for t ≤ 0, too. The explicit solution of KdV by Gardiner–Greene–Kruskal–Miura [11] makes a fascinating story. I cannot tell it here, but see Lamb [14] for the best introduction to what has come to be called “KdV and all that”. But one remarkable aspect of it can be told quite simply, following a beautiful computation of Ch. Pöppe [21]. Take a solution w = w(t; x) of KdV with the non-linearity v∂v/∂x crossed out, i.e. ∂w/∂t = ∂3w/∂x3, and use it to make an operator W on the half-line [0; ∞), depending upon 0 ≤ t < ∞ and −∞ < x < ∞, as in

Z∞ W : f ∈ C[0; ∞) 7→ w(t; x + ξ + η) f(η) dη: 0

Now let ∆ = ∆(t; x) be the determinant det (I + W ), introduced into the subject by Dyson [8]. Then, miraculously, v(t; x) = −(1/12) ∂2 ln ∆/∂x2 solves KdV, which is to say that this machinery w → ∆ → v converts the (easy) linear ﬂow ∂w/∂t = ∂3w/∂x3 into the (hard) non-linear KdV ﬂow ∂v/∂t + v ∂v/∂x = ∂3v/∂x3, but more of this below. Of course, you must take care that v makes sense: The function little w had better decay at x = +∞ so that ∆ is kosher as a Fredholm determinant, as would be the case if w(0+; ·) were smooth and rapidly vanishing at ±∞ and you took

+∞ Z √ √ w t; x 1 e− −1/kx e −1 k3t w ; k dk; ( ) = π b(0+ ) 2 −∞

√ R − kx in which b signiﬁes the transform f 7→ e 1 f(x) dx. Then ∆ = 1 at x = +∞, but you still need to keep ∆ positive elsewhere so that ln ∆ is nice, which is to say that I + W has never any null-space. Now if f ∈ C[0; ∞) is such a (real) null-function, then the rapid vanishing of w at +∞ imparts the same decay to f itself, and with the understanding that f(ξ), vanishes for ξ ≤ 0, you may conclude from

Z √ √ √ 1 − − kξ − − kx − k3t f(ξ) + e 1 e 1 e 1 wb(0+; k)bf(−k) dk = 0 2π that Z∞ Z+∞ Z+∞ 1 1 f2 = |bf|2 = |wb|2|bf|2 dk 2π 2π 0 −∞ −∞ ∗ in view of bf(k) = bf(−k), f being real. This cannot be maintained if |wb| < 1 unless bf, and so also f, vanishes: in fact, |wb| ≤ 1 suﬃces and is necessary, as well, but that is a little longer story which I omit. This ends the preparations.

3.5. Pöppe’s computation

00 0 Pöppe showed, by direct manipulation of determinants, that −(1/12)(ln ∆) solves KdV. The simpler function V ≡ −(ln ∆) − is better suited to the proof. It is − sp (1 + W ) 1DW with D = d/dx, by rule 6 of Section 2.4, and may be reduced to − (I + W ) 1W (0; 0)/2 by means of the “Hankel” character of W , by which I mean that its kernel is a function of ξ + η − − alone: in fact, the kernel of (I + W ) 1DW is (∂/∂η) (I + W ) 1W (ξ; η), so

− 1 − − V = − sp (I + W ) 1DW = − sp DW (I + W ) 1 + (I + W ) 1DW 2 Z∞ 1 − − 1 d − 1 − = − sp D(I + W ) 1W + (I + W ) 1DW = − (I + W ) 1W (ξ; ξ) dξ = (I + W ) 1W (0; 0): 2 2 dξ 2 0

235 Fredholm determinants

0 00 000 0 • 0 I will need V ;V ;V ; (V )2, and also V . The ﬁrst three are easy, as is the last: with the notation W = DW etc. and abbreviating [ ](0; 0) by [] plain, you ﬁnd

0 1 − 0 − − 0 1 − 0 − V = − (I + W ) 1W (I + W ) 1W + (I + W ) 1W = (I + W ) 1W (I + W ) 1 ; 2 2 00 − 0 − 0 − 1 − 00 − V = − (I + W ) 1W (I + W ) 1W (I + W ) 1 + (I + W ) 1W (I + W ) 1 ; 2 000 − 0 − 0 − 0 − − 00 − 0 − V = 3 (I + W ) 1W (I + W ) 1W (I + W ) 1W (I + W ) 1 − 3 (I + W ) 1W (I + W ) 1W (I + W ) 1 1 − 000 − + (I + W ) 1W (I + W ) 1 ; 2

and

• 1 − 000 − V = (I + W ) 1W (I + W ) 1 : 2

0 The reduction of (V )2 to like form employs the Hankel trick again: Let the operators A; B; C and D have nice kernels and good decay at +∞, abandoning temporarily the prior meaning of D, and let B and C be Hankel besides. Then, with an obvious notation, ∞ ∞ Z d Z d AB ; CD ; − AB ; ξ × CD ξ; dξ − AB ; ξ × CD ξ; dξ [ ](0 0)[ ](0 0) = dξ [ ](0 ) [ ]( 0) [ ](0 ) dξ [ ]( 0) 0 0 Z∞ Z∞ 0 0 0 0 = − [AB ](0; ξ)[CD](ξ; 0) dξ − [AB](0; ξ)[C D](ξ; 0) dξ = −[A(B C + BC )D](0; 0): 0 0 0 Now write 2V in its initial form:

0 − 0 − 0 − 2V = (I + W ) 1W − (I + W ) 1W (1 + W ) 1W ;

dropping the (0; 0) as before, and square:

0 − 0 0 − − 0 − 0 − 4(V )2 = (I + W ) 1W (W (I + W ) 1 − 2 (I + W ) 1W W (I + W ) 1W (I + W ) 1 − 0 − − 0 − + (I + W ) 1W (I + W ) 1W W (I + W ) 1W (I + W ) 1 ;

0 − in which the symmetric character of W;W , and (I + W ) 1 is used to manipulate the second factors one by one. Then apply the ABCD rule to produce

0 − 00 0 0 00 − − 00 0 − 0 − 4(V )2 = − (I + W ) 1 W W + W W (I + W ) 1 + 2 (I + W ) 1 W W + (W )2 (I + W ) 1W (I + W ) 1 − 0 − 0 0 − 0 − − (I + W ) 1W (I + W ) 1 W W + WW (I + W ) 1W (I + W ) 1

• 000 0 and write out V − V + 6(V )2 in the present notation:

• 000 0 V − V + 6(V )2 1 − 000 − = (I + W ) 1W (I + W ) 1 (1) 2 − 0 − 0 − 0 − − 3 (I + W ) 1W (I + W ) 1W (I + W ) 1W (I + W ) 1 −(2) − 00 − 0 − + 3 (I + W ) 1W (I + W ) 1W (I + W ) 1 (3) 1 − 000 − − (I + W ) 1W (I + W ) 1 −(4) 2 − 00 0 − − 3 (I + W ) 1W W (I + W ) 1 −(5) − 00 − 0 − + 3 (I + W ) 1W I − (I + W ) 1 W (I + W ) 1 (6) − (7) − 0 − 0 − + 3 (I + W ) 1(W )2(I + W ) 1W (I + W ) 1 (8) − 0 − 0 − 0 − − 3 (I + W ) 1W (I + W ) 1W I − (I + W ) 1 W (I + W ) 1 −(9) + (10)

236 H.P. McKean

in which (1) and (4) cancel, also (5) and (6), (3) and (7), (8) and (9), (2) and (10), i.e. the whole thing vanishes. 0 • 000 0 Remarkable! Now take v = V . Then v − v + 12vv = 0 which is KdV aside from the nuisance factor 12, and this is removed by changing v into v/12.

Example: solitons.

Take, as solution of ∂w/∂t = ∂3w/∂x3, the function n X 3 w(t; x) = mi exp − kix − ki t i=1 00 with positive numbers m and distinct positive k’s. Then v = −(1/12)(ln ∆) solves KdV. Here, ∆ can be expressed pretty explicitly. Let’s look for eigenfunctions of W at t = 0: Z∞ X −kiξ −kiη W : f 7→ mi e e f(η) dη 0

Pn −kj ξ f ξ fj e W f λf so any such eigenfunction is of the form ( ) = j=1 , and a self-evident computation shows that = is the same as to say n X mi λfi = fj for 1 ≤ i ≤ n; ki kj j=1 + i.e. ∆ = det (I + W ) is the ordinary determinant mi det I + ; ki kj + 1≤i;j≤n which may be spelled out in Fredholm’s way as

n n 2 X X m Y ki − kj 1 + ; k ki kj p |n| p 2 i

n n n n ≤ n < n < ··· < n ≤ n m/ k m / k 1 × m / k 2 × : in which is a collection of indices 1 1 2 p and ( 2 ) means ( 1 2 1) ( 2 2 2) etc I leave this to you as an exercise with a hint to be veriﬁed ﬁrst: 1 Y Y det = (ki − kj )2 (ki + kj )2: ki kj + 1≤i;j≤n i

00 Omitting the corrective 1/12, the corresponding solution of KdV is then v(t; x) = −(ln ∆) with m i −kix −k3t ∆ = det I + e e i : ki kj + 1≤i;j≤n

For n = 1, ∆ is simply 1 + (m/2k) exp (−kx − k3t), and

2 3 k − kx k t 1 2k v(t; x) = − ch 2 + + ln ; 4 2 2 2 m as you will check. This is the famous “soliton” of “KdV and all that”, representing, e.g. a solitary wave (tsunami) in the open ocean, traveling east to west; compare Lamb [14]. For n ≥ 2, v is a non-linear combination of such solitons. Here is why: The soliton has amplitude k2/4 and speed k2, i.e. big solitons go fast, small solitons go slow. Now you can start off a solution of KdV by placing widely-spaced solitons in order of diminishing amplitude/speed, right to left. They 0 hardly feel each other: the individual soliton dies off fast from its peak value, so the non-linearity vv does not count for much, and you can simply add the n solitons together. Now turn on KdV: The big solitons catch up to the little ones, and a complicated (non-linear) interaction takes place, but by and by you see the original solitons emerging to the left, moving off to −∞ in order of increasing amplitude/speed, left to right, with a little evanescent “radiation” in between. 00 It is this evolution that v = −(ln ∆) describes. I omit the computation, but try it for n = 2.

237 Fredholm determinants

Coda.

Now the only trouble is that you don’t know how to pick w(0+; ·) to produce the initial velocity proﬁle v(0+; ·) you want. That is a pretty long story, how it’s done. Here, I am after determinants and just refer you to Lamb [14] and Dyson [8] for the full story. It involves an elaborate detour via a sort of non-linear Fourier transform (“scattering”), quite remarkable in itself but, to my mind, a little wrong-headed. If only you could invert the map w(0+; ·) → v(0+; ·) directly, in real • space√ so to say, then all would be well. I like to make this caricature: To solve x = 1 is easy even if you know nothing; • − R x − / x − x2 y 1 x − r2 1 2 dr = 1 looks harder, but if you have the wit to introduce the special function = sin = 0 (1 ) , then • y = 1, x = sin y, and you’re back in business. The map v → w is just a variant of this device, i.e. it is an ∞-dimensional 00 “special function”, inverse to w → v = −(ln ∆) . How bad can it be? The question merits a direct answer.

3.6. The unitary ensemble: background and preparations

The idea to model atomic energy levels by the spectra of random matrices is due to Wigner [27]. The subject has been much developed since then; see Mehta [19] for the best general account. Curiously, there even seems to be a mysterious connection to Gauss’s prime number theorem, # {primes ≤ n}' n/ lg n, mediated by Riemann’s zeta function, for which see Conrey [4]. The group G of N×N unitary matrices provides the simplest example of what people do. G is equipped with its (one and only) translation invariant volume element, normalized to make the total volume equal√ 1. That is the − θ only natural way to introduce probabilities into the picture, and it is the statistics of the eigenvalues e 1 of a “typical” unitary matrix which is sought, especially in high dimensions N ↑ ∞. The pretty fact is that these statistics can be N < ∞ expressed by means√ of Fredholm determinants: for example, for , the probability that no eigenvalue falls in the − θ circular arc e 1 : α ≤ θ ≤ β is 1 sin N(θ − ')/2 det I − : 2π sin (θ − ')/2 α≤θ;'≤β

The discussion is adopted from Tracy–Widom [22], more or less.

The invariant volume. √ D ⊂ G e e −1 θ I follow√ Weyl [24, pp. 194–198]. Let be the commutative subgroup of diagonal unitaries = , by which I − θn mean e 1 , 1 ≤ n ≤ N, on diagonal.

The figure depicts G fibered by D over the base G/D. To understand what is meant, take a general element g ∈ G − and bring it, as you may, to diagonal form by a unitary similarity: g = heh 1 with e ∈ D and h ∈ G. Here, h can 0 be multiplied from the right by any e ∈ D without changing g, so it is only the coset hD that counts. In fact, for g in − “general position”, both this coset and the covering point e figuring in g = heh 1 are fully determined in the small by the requirement that e not change much in response to a little change in g – e is made from the eigenvalues of g which can only be permuted, and for g in general position, such a permutation is a big change, so if now h is determined at

238 H.P. McKean

− the base, then also e is known, and likewise in a small patch about g = heh 1, with e and h changing only a little, by de and dh in response to a little change dg in g. Obviously √ − − − − − − − − − − − g 1 dg = he 1h 1 dh eh 1 + h de h 1 − heh 1 dh h 1) = h e 1h 1 dh e − h 1 dh + −1 dθ h 1; in which you may reduce the apparent supernumerary (real) degrees of freedom (N2 for dh plus N more for dθ) 2 −1 to the correct√ number (N ) by the observation that h dh is pure imaginary on diagonal, permitting the restriction − − h 1dh = −1 dθ there. The volume element Ω is now produced by wedging the several 1-forms g 1dg up to top dimension. Each of these is insensitive to left translations, so Ω inherits this feature. Besides, the associated line element − †∗ − †∗ sp (g 1dg) g 1dg = sp (dg) dg ≡ |dg|2

−1 is insensitive to the presence√ of h to the left and h to the right, so these can be dropped and the bare 1-forms − − − e 1h 1dhe − h 1d = h + −1 dθ wedged up to top dimension to produce the volume element: Y ej − dθ dθ ::: × ω G/D ; Ω = 1 1 2 [a volume element on ] ei i=6 j or, taking the factors of the product by pairs, Y Ω = |ei − ej |2dθ × ω: i

The spectral density.

The form of ω does not matter anymore. What is wanted is the volume element induced upon the eigenvalues:

√ √ − Y − θi − θj 2 Y Z 1 × e 1 − e 1 dθk ≡ p(θ) dθ; i

Z in which is a normalizing factor, to be determined presently,√ ﬁxing the total volume at 1 so that these are probabilities. − i− θ Q e 1 ( 1) j To put them in a better form, introduce the matrix = 1≤i;j≤N with the Vandermonde-type determinant:

Y √ √ e −1 θj − e −1 θi : i

Then √ √ Y − θj − θi 2 † ∗ † ∗ e 1 − e 1 = det Q × det Q = det Q Q ; i

239 Fredholm determinants

Now

"N− # N− N− N− N− X1 √ X1 X1 h √ i X1 X1 h √ i e −1(θi−θj )k ::: e −1 (θi−θj )ki ::: e −1 θi(ki−kj ) ; det = det ≤i;j≤N = det ≤i;j≤N k k k 1 k k 1 =0 1≤i;j≤N 1=0 2=0 1=0 2=0

and as the latter determinant vanishes unless the k’s are distinct, so Z reduces to the sum over such k’s of

N X χ π Z Y √ ( ) e −1 θi(ki−kπi) dθ: π N π (2 ) i=0

But all these integrals vanish unless π is the identity, so the whole reduces to the number of choices of N distinct k’s from 0; 1;:::;N − 1, i.e. Z = N! . p θ ; : : : ; θ θ The same style of computation produces a nice formula for the marginal density ( 1 n) with the other angles k , N ≥ k > n, integrated out: N − n p θ ; : : : ; θ ( )! K θ − θ : ( 1 n) = det ( i j ) ≤i;j≤n N! 1 I leave that to you as an exercise.

3.7. Occupation numbers, determinants and the scaling limit

Now Fredholm determinants enter the story. Let # be the number of angles θ that fall within an interval J ⊂ [0; 2π). It is desired to evaluate the probabilities P(# = n) for n ≥ 0. Everything is symmetric in the angles and there are “N choose n” ways of specifying which particular angles fall in J, so

  n N N \ \ 0 P n P  Ei E  (# = ) = n j i=0 j=n+1

0 in which Ei is the event “θi ∈ J” and Ej the compliment of Ej , i.e. “θj ∈/ J”. This yields to the principle of inclusion- Tn Ei E exclusion: with i=0 = for short, you have

 N   N  \ 0 [ P E ∩ Ej  = P(E) − P E ∩ Ej  j=n+1 j=n+1 X X X = P(E) − P(E ∩ Ei) + P(E ∩ Ei ∩ Ej ) − P(E ∩ Ei ∩ Ej ∩ Ek ) + ::: i>n j>i>n k>j>i>n N X N − n − p−n PE ∩ E ∩ ::: ∩ E = ( 1) p − n n+1 p by symmetry p=n N Z X p−n N − n (N − p)! − K θi − θj dθ; = ( 1) p − n N det ( ) 1≤i;j≤p p n ! = Jp

which reduces, after multiplication by “N chose n”, to

N p−n Z X (−1) P n K θi − θj dθ: (# = ) = n p − n det ( ) 1≤i;j≤p p n !( )! = Jp

240 H.P. McKean

n Here is the punch-line: multiply by (1 − λ) , sum from n = 0 to N, and reverse the two summations to produce

N N " p # X X X p Z − λ nP n − λ n − p−n ! 1 K θ − θ dθ (1 ) (# = ) = (1 ) ( 1) det ( i j ) ≤i;j≤p n!(p − n)! p! 1 n=0 p=0 n=0 Jp N X −λ p Z ( ) K θ − θ dθ I − λ K J × J ; = det ( i j ) ≤i;j≤p = det ( restricted to ) p! 1 p=0 Jp

n K being of rank N, i.e. P(# = n) is the coeﬃcient of (1 − λ) in det I − λ (K restricted to J × J) . < θ∗ < θ∗ < : : : θ∗ − θ∗ π/N Now let 0 1 2 be the angles taken in order. Obviously, the mean-value of 2 1 is 2 , by symmetry, so if you want to see what happens for inﬁnite dimensions, i.e. for N ↑ ∞, you’d better scale the angles up by the factor N/2π, as in

n P #[0; 2πL/N) = n = the coeﬃcient of (1 − λ) in πL/N πL/N N p 2Z 2Z X (−λ) 1 sin N(θi − θj )/2 ::: det dθ p π θi − θj / p ! 2 sin ( ) 2 1≤i;j≤p =0 0 0 L L N p Z Z X (−λ) 1 sin π(xi − xj ) = ::: det dx p π N/ π π xi − xj /N p ! 2 ( 2 ) sin ( ) 1≤i;j≤p =0 0 0 L L ∞ p Z Z X (−λ) sin π(xi − xj ) ' ::: det dx for large N p π xi − xj p ! ( ) 1≤i;j≤p =0 0 0 π x − y I − λ sin ( ) I − λK = det π x − y = det ( ) ( ) 0≤x;y≤L with sin π(x − y) K(x; y) = restricted to 0 ≤ x; y ≤ L: π(x − y) It follows that P ; πL/N n Nlim↑∞ #[0 2 ) = exists for each n ≥ 0, and it is a source of satisfaction that these numbers add to unity, i.e. they represent a genuine probability distribution. This is because det (I − λK) reduces to 1 at λ = 0.

Jimbo et al. [12] discovered the surprising fact that ∆ = det (I − λK), as a function of L, solves a diﬀerential equation – and a famous one too, called Painlevé 5 – but that lies outside the scope and purpose of these notes. A nice account may be found in Tracy–Widom [22]. Here I note only one particular consequence: For λ = 1, ∆(L) is just the probability that no (scaled) angle θ is found in the interval between 0 and L, and this has a Gaussian tail, to wit, ∆(L) ' exp − π2L2/8 .

Appendix on compactness

The space C[0; 1]. F ⊂ C d f ; f kf − f k The statement is that the ﬁgure is compact relative to the distance ( 1 2) = 1 2 ∞ if (and only if) it is 1) closed, 2) bounded, and 3) |f(b) − f(a)| ≤ ω(|b − a|) for every f ∈ F with some universal, positive, increasing function ω(h), h > 0, subject to ω(0+) = 0. I sketch the proof.

241 Fredholm determinants

Suﬃciency.

r ; r ;::: ≤ r ≤ f r Let 1 2 be a list of the (countably many) rational points 0 1. The numbers n( m) are bounded for ﬁxed m ≥ n n ; n ;::: n ; ; ;::: f r n ∈ n 1, so you can pick 1) a subsequence 1 = ( 1 2 ) of = (1 2 3 ) to make n( 1), 1, tend to some f r n n ; n ;::: ⊂ n f r n ∈ n f r number ∞( 1), 2) a further subsequence 2 = ( 21 22 ) 1, to make n( 2), 2, tend to some number ∞( 2), d n ; n ; n ;::: f r f r and so on, and if you now pass to the diagonal sequence = ( 11 22 33 ), then d( m) will tend to ∞( m) for every m = 1; 2; 3;::: simultaneously. 3) does the rest.

Necessity.

3) states that sup |f(b) − f(a)| : f ∈ F; 0 ≤ a < b ≤ 1; (b − a) ≤ h ↓ 0 with h:

Its negation asserts the existence of a number ω > 0 of this nature: that for every n ≥ 1, you can ﬁnd fn ∈ F and 0 ≤ an < bn ≤ 1 with bn − an ≤ 1/n so as to make |fn(bn) − fn(an)| ≥ ω. But if F were compact, you could weed out to make fn → f∞ and, at the same time, make an & bn tend to a common value c, [0; 1] being compact. Then 0 < ω ≤ |fn(bn) − fn(an)| tends to 0, which is contradictory. Dini’s Theorem, used twice in Section 2.7, belongs to the same circle of ideas. It states that if fn ∈ C[0; 1] decreases pointwise to 0, then fn → 0, i.e. kfnk∞ ↓ 0. Otherwise, for each n ≥ 1, fn(xn) exceeds some ﬁxed number ω > 0 for some 0 ≤ xn ≤ 1. Then you can weed out to make xn tend to x∞ as n ↑ ∞, with the contradictory result that

fm x∞ fm xn ≥ fn xn ≥ ω; ( ) =n lim↑∞ ( ) nlim↑∞ ( )

i.e. fm(x∞) does not decrease to 0.

2 The space L [0; 1]. R B e 1 e2 ≤ } L2 Now comes the “weak” compactness of the unit ball = : 0 1 . The key to this is the fact that contains a countable dense family of functions fn, n ≥ 1, such as the broken lines with corners at rational places and rational slopes between. Take en, n ≥ 1, from B. The diagonal method employed above permits you to weed out so as to make (en; fm) n ↑ ∞ L f m ; ; ;::: |L f − L f | ≤ kf − f k tend, as , to some number ( m) for every = 1 2 3 , simultaneously. Obviously, ( i) ( j ) i j 2, whence L extends to a bounded linear map from L2[0; 1] to R and (en; f) tends to the extended L(f) for every f ∈ L2, simultaneously. The ﬁnal step employs the fact that any such L is of the form L(f) = (e∞; f) with a ﬁxed function o e∞ ∈ L2[0; 1]. The proof is easy. Let N be the null space of L and N its annihilator. Then either L(f) ≡ 0, i.e. en 0 or L e 6 e ∈ No e e e ∈ No L L e e −L e e else ( ) = 0 for any non-vanishing . Pick such an = 1. Then for any other 2 , sends ( 1) 2 ( 2) 1 No No e kek f to 0 from which you infer that dim = 1, i.e, = R with 2 = 1 if you want. Now split into its projection o e(e; f) onto N plus the rest, viz. f − e(e; f) ∈ N. Then L(f) = L(e)(e; f) ≡ which is to say en L(e) × e. This is the weak compactness of the unit ball. That’s all you need to know about compactness, but I will add a simple

Exercise. √ The functions en = 2 sin nπx, n ≥ 1, form a unit perpendicular base of L2[0; 1], i.e. every f ∈ L2 can be expanded as P P 2 2 2 f = ˆf(n) en with ˆf(n) = (en; f). Prove that the ellipsoid E = f : |ˆf(n)| /ln ≤ 1 is (truly) compact in L [0; 1] if and 2 only if the positive numbers ln tend to 0 as n ↑ ∞.

242 H.P. McKean

References

[1] Ahlfors L.V., Complex Analysis, Internat. Ser. Pure Appl. Math., 3rd ed., McGraw-Hill, New York, 1979 [2] Brown R., A brief account of microscopical observations made in the months of June, July, and August, 1827, on the particles contained in the pollen of plants, etc., Philosophical Magazine, 1828, 4, 161–173 [3] Cameron R.H., Martin W.T., The Wiener measure of Hilbert neighborhoods in the space of real continuous functions, Journal of Mathematics and Physics Mass. Inst. Tech., 1944, 23, 195–209 [4] Conrey J.B., The Riemann hypothesis, Notices Amer. Math. Soc., 2003, 50(3), 341–353 [5] Courant R., Differential and Integral Calculus, vol. 2, Wiley Classics Lib., John Wiley & Sons, New York, 1988 [6] Courant R., Hilbert D., Methoden der Mathematischen Physik, Springer, Berlin, 1931 [7] Dyson F.J., Statistical theory of the energy levels of complex systems. I, J. Mathematical Phys., 1962, 3, 140–156 [8] Dyson F.J., Fredholm determinants and inverse scattering problems, Comm. Math. Phys., 1976, 47(2), 171–183 [9] Einstein A., Über die von der molekularkinetischen Theorie der Wärme gefordete Bewegung von in ruhenden Flüs- sigkeiten suspendierten Teilchen, Ann. Phys., 1905, 17, 549–560; reprinted in: Investigations on the Theory of the Brownian Movement, Dover, New York, 1956 [10] Fredholm I., Sur une classe d’équations fonctionelles, Acta Math., 1903, 27(1), 365–390 [11] Gardner C.S., Greene J.M., Kruskal M.D., Miura R.M., Methods for solving the Korteweg–deVries equation, Phys. Rev. Lett., 19(19), 1967, 1095–1097 [12] Jimbo M., Miwa T., Môri Y., Sato M., Density matrix of an impenetrable Bose gas and the fifth Painlevé transcendent, Phys. D, 1980, 1(1), 80–158 [13] Kato T., A Short Introduction to Perturbation Theory for Linear Operators, Springer, New York–Berlin, 1982 [14] Lamb G.L., Jr., Elements of Soliton Theory, Pure Appl. Math. (N. Y.), John Wiley & Sons, New York, 1980 [15] Lax P.D., Linear Algebra, Pure Appl. Math. (N. Y.), John Wiley & Sons, New York, 1997 [16] Lax P.D., Functional Analysis, Pure Appl. Math. (N. Y.), John Wiley & Sons, New York, 2002 [17] Lévy P., Le Mouvement Brownien, Mémoir. Sci. Math., 126, Gauthier-Villars, Paris, 1954 [18] McKean H.P.,Jr., Stochastic Integrals, Probab. Math. Statist., Academic Press, New York–London, 1969 [19] Mehta M.L., Random Matrices, 2nd ed., Academic Press, Boston, 1991 [20] Munroe M.E., Introduction to Measure and Integration, Addison–Wesley, Cambridge, 1953 [21] Pöppe Ch., The Fredholm determinant method for the KdV equations, Phys. D, 1984, 13(1-2), 137–160 [22] Tracy C.A., Widom H., Introduction to random matrices, In: Geometric and Quantum Aspects of Integrable Systems, Scheveningen, 1992, Lecture Notes in Phys., 424, Springer, Berlin, 1993, 103–130 [23] Uhlenbeck G.E., Ornstein L.S., On the theory of the Brownian motion, Phys. Rev., 1930, 36, 823–841 [24] Weyl H., The Classical Groups, Princeton University Press, Princeton, 1939 [25] Whittaker E.T., Watson G.N., A Course of Modern Analysis, 4th ed., Cambridge University Press, New York, 1962 [26] Wiener N., Differential space, Journal of Mathematics and Physics Mass. Inst. Tech., 1923, 2, 131–174 [27] Wigner E., On the distribution of the roots of certain symmetric matrices, Ann. of Math., 1958, 67, 325–326

243