Dirichlet Sobolev Spaces on [0, 1] and the Brownian Bridge

Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Version: June 22, 2017, 14:28

1 Motivation: Donsker’s Theorem

If X are iid random variables with common CDF { i} F (x) := P[X x], i ≤ then the empirical CDF # i n : X x Fˆ (x) := { ≤ i ≤ } n n converges pointwise almost-surely to F (x) by the strong . For finite n the of n Fˆ (x):=# i n : X x n { ≤ i ≤ } is Bi F (x),n , so also  √n Fˆ (x) F (x) No 0, F (x)[1 F (x)] n − ⇒ −   by the . This is helpful but not quite enough to help us study the probability distribution of functionals like the Kolmogorov-Smirnov statistic

Dn := sup Fˆn(x) F (x) , (1)

∆ (t) := √n Gˆ (t) t , 0 t 1 (2b) n n − ≤ ≤   1 converged to that of the “Brownian Bridge” B(t), a mean-zero Gaussian on the unit interval that we will construct below, and hence that for any continuous CDF F ( ), ∆ (t) · n converges in distribution to B(t) and √nD to sup B(t) . n 0

2 The Brownian Bridge

The Brownian Bridge is a stochastic process Bt defined on the unit interval [0, 1] and characterized in any of the following ways: A continuous-path Gaussian process with mean zero and covariance γ(s,t)= EB B given by • s t γ(s,t) := s t st (3a) ∧ − x 2 A Diffusion with initial value B0 = 0, drift at(x)= 1− t , and diffusion rate bt (x) = 1; • − A process related to W (t) on R by the relations • + t s B = (1 t)W W = (1+ s)B (3b) t − 1 t s 1+ s  −    A process related to Brownian Motion W (t) on [0, 1] by the relation • B = W (t) tW (1) t − A process related to Brownian Motion W (t) on [0, 1] by the relation • B W (t) W (1) = 0. (3c) t ∼ | We will construct it quite explicitly in Section (3.1) below, so for now we can view these as heuristic and see where Eqn (3a) leads us. Denote by the real Hilbert space of all square-integrable real-valued Borel functions on the H0 unit interval, with inner product

f, g := f(t)g(t) dt (4) h i0 Z (where this and all similar unlabeled integrals below are taken over the unit interval [0, 1]) and norm

|f| := f, f 1/2. 0 h i0

2 All continuous functions on the compact set [0, 1] are bounded and so are in , and the space H0 of smooth compactly-supported functions φ : (0, 1) R is dense in (so is the larger space Cc∞ → H0 ([0, 1]) of continuous functions, of course). C Let t B(t) be a “Brownian Bridge”, a normally-distributed random element of ([0, 1]) with C mean EB(t) = 0 and covariance EB(s)B(t) = γ(s,t) of Eqn (3a). We are now going to define a “generalized random process” or “random field” based on B, a linear transformation that takes elements φ into the Gaussian random variables ∈Cc∞ B[φ] := φ(t)B(t)dt = φ, B (5) h i0 Z by integrating against the Brownian Bridge B(t). These each have mean zero, and covariance which we will denote by

φ, ψ 1 :=EB[φ]B[ψ]= φ(s)γ(s,t)ψ(t) ds dt. (6) h i− ZZ Let 1 denote the Hilbert space completion of c∞ in the inner product , 1, with norm H− C h· ·i− 1/2 |φ| 1 := φ, φ 1 . − h i−

Evidently we may write the 1 inner product in the form H−

φ, ψ 1 = φ(s) γ(s,t)ψ(t) dt ds h i− Z Z  = φ, Gψ , (7) h i0 where “G” denotes the linear operator given by

Gψ(s) := γ(s,t)ψ(t) dt

Z s 1 = t(1 s)ψ(t) dt + s(1 t)ψ(t) dt. (8) − − Z0 Zs It will prove useful to explore properties of the operator ψ Gψ. First, note from Eqn (8) that for any ψ the function g(s) := Gψ(s) satisfies the Dirichlet boundary conditions g(0) = 0 and ∈ H0 g(1) = 0. It is also continuous and differentiable (even if ψ wasn’t), with

s 1 g′(s)=s(1 s)ψ(s) tψ(t) dt s(1 s)ψ(s)+ (1 t)ψ(t) dt − − 0 − − s − s Z 1 Z = tψ(t) dt + (1 t)ψ(t) dt − − Z0 Zs for every s and, for any point s where ψ is continuous,

g′′(s)= sψ(s) (1 s)ψ(s)= ψ(s). (9) − − − − An “eigenfunction” for the operator G is a function ψ for which there is some constant λ C such ∈ that Gψ(s) = λψ(s) for all s, similar to eigenvectors for matrices. Since ψ, ψ 1 = ψ, Gψ 0 = h i− h i

3 λ ψ, ψ > 0, necessarily λ > 0. Such an eigenfunction must satisfy ψ(0) = 0 and ψ(1) = 0, h i0 since g = Gψ does, and λψ = ψ. The solutions to λψ = ψ for λ > 0 are all of the form ′′ − ′′ − ψ(t) sin(at + b) with a2 = 1/λ and b R arbitrary. For ψ(0) = 0 we need b = 0, while for ∝ ∈ ψ(1) = 0 we need a Nπ, and hence λ = (n2π2) 1 for some n N. The solutions with unit ∈ − ∈ H0 norm are:

ψn(t) :=√2 sin(nπt). (10)

In fact, these form a complete orthonormal system in : ψ , ψ is zero if n = m and one if H0 h n mi0 6 n = m (so |ψ | = 1), and every element φ has a unique convergent expansion n 0 ∈H0

φ = anψn n N X∈ with coefficients given by

a = φ, ψ = φ(t)ψ (t) dt, n h ni0 n Z called the Fourier sine series. To see this, start with the trig identities cos(x y) = cos(x) cos(y) ± ∓ sin(x) sin(y) (which follow easily from Euler’s formula exp(iθ) = cos(θ)+ i sin(θ)), we find that 2 sin(x) sin(y)= cos(x y) cos(x + y) and so for n,m N, { − − } ∈ ψ , ψ = 2 sin(nπt) sin(mπt) dt h n mi0 Z = cos[(n m)πt] cos[(n + m)πt] dt { − − } Z =1 if n = m, and 0 if n = m. 6 Now comes the fun part. If we expand each of two elements φ, ψ of in their Fourier sine H0 series φ = anψn ψ = bmψm, n N m N X∈ X∈ then multiply these and integrate, we see that both their inner product and their norms

φ, ψ := φ(t)ψ(t) dt h i0 Z = anbm ψn(t)ψm(t) dt n N m N Z X∈ X∈ = anbn n N X∈ 1/2 |φ| = φ, φ 1/2 = a 2 0 h i0 | n| (n N ) X∈

4 2 2 1 can be evaluated in terms of the Fourier coefficients. Moreover, since Gψn = (n π )− ψn,

Gψ = bmGψm m N X∈ 2 2 1 = bm(m π )− ψm m N X∈ and so the 1 inner product (and covariance) can also be computed from the coefficients, as H−

φ, ψ 1 =EB[φ] B[ψ] (11) h i− = φ, Gψ h i0 2 2 1 = (π n )− anbn. n N X∈ 3 Dirichlet Sobolev Spaces

We have seen two inner products on spaces of functions on the unit interval, each expressible in terms of the Fourier sine coefficients:

φ, ψ = φ(t)ψ(t) dt = a b h i0 n n Z n N X∈ 2 2 1 φ, ψ 1 = φ(s)γ(s,t)ψ(t) ds dt = (π n )− anbn h i− ZZ n N X∈ This suggests a generalization:

Definition 1 (Sobolev Space). For s R, let be the Hilbert-space completion of in the ∈ Hs Cc∞ inner product

φ, ψ = (π2n2)sa b (12) h is n n n N X∈ and norm

1/2 |φ| = φ, φ 1/2 = (π2n2)s a 2 , (13) s h is | n| (n N ) X∈ where a := φ, ψ = φ(t)ψ (t) dt and b := ψ, ψ = ψ(t)ψ (t) dt. n h ni0 n m h mi0 m These are the “DirichletR Sobolev spaces” of order s on theR unit interval [0, 1]. Roughly speaking, as we’ll see below, they are the spaces of of s-times differentiable functions on the interval that vanish at the end-points. We also define

:= s := s; H∞ H H−∞ H s R s R \∈ [∈

5 each of these is a complete separable metric space (a Fr`echet space, in fact), but neither is a Hilbert or Banach space. The elements of , called “test functions”, are ∞ functions that vanish to H∞ C all orders at the boundary of [0, 1], while is a collection of interesting objects called (alas) H−∞ “distributions” by the mathematical community. We’ll look at these more in Section (3.2). For all

c∞ t s C ⊂H∞ ⊂H ⊂H ⊂H−∞ One can construct similar spaces with other boundary conditions, such as Neumann φ′(0) = φ (1) = 0 (where ψ (x) cos(nπx) for n Z = 0, 1, 2, ... ) or Periodic φ(0) = φ(1), or (the ′ n ∝ ∈ + { } most widely studied) Free boundary conditions on all of R.

3.1 Sobolev Representation of the Brownian Bridge If a Brownian Bridge B(t) has continuous sample paths with mean zero and covariance function γ(s,t), then its Fourier sine coefficients

ζ = B[ψ ]= B, ψ = B(t)ψ (t) dt n n h ni0 n Z will be Gaussian random variables with mean zero and covariance (see Eqn (6))

Eζmζn = ψm, ψn 1 h i− = ψ , Gψ h m ni0 = ψ , 1 ψ h m π2n2 ni0 2 2 1 = (π n )− if m = n, and 0 otherwise.

iid Thus Z := nπζ No(0, 1) and we can recover B(t) as the L -convergent sum { n n} ∼ 2 √2 B(t)= Z sin(nπt). (14a) πn n n N X∈ We take this as our constructive definition of the Brownian Bridge (we’ll verify path continuity below). The sample paths lie in almost surely for each s< 1/2, because Hs 2 E|B|2 = (π2n2)s EZ2 (14b) s π2n2 n n N X∈ 2 2 (s 1) = 2 (π n ) − n N X∈ < for s< 1/2. ∞ By Eqns (14a, 14b) we may think of the Brownian Bridge t B(t) as a random element of Hs for any s< 1/2. Its (probability) distribution is the Borel measure P[B A] for Borel sets A . ∈ ⊂Hs By Eqn (11) we can also view B as the “unit Gaussian process” on the Hilbert space 1, i.e., H− a linear isomorphism B : 1 L2(Ω, , P) from 1 into a space of square-integrable random H− → F H− variables with the property that inner products are preserved:

E B[φ] B[ψ] = φ, ψ 1 h i−  6 Using Eqn (3b) we can construct the W (t) on all of R+ from B on [0, 1]; we can also construct W (t) on [0, 1] in a manner similar to that above with mixed boundary conditions g(0) = 0 = g′(1), leading to

√2 W (t)= Z sin (n 1/2)πt . π(n 1/2) n − n N − X∈ 

iid 3.1.1 Testing H : X F (x) 0 i ∼ If X iid F (x) for a continuous CDF F (x) we’ve seen that the empirical CDF Gˆ (u) for U := F (X ) { j} ∼ n { j j } satisfies ∆ (t) := √n Gˆ (t) t B(t), n n − ≈ approximately distributed as a Brownian Bridge for large n. On the other hand, if the X are { j} iid with some other continuous CDF F˜ that is close to F , write that CDF in the form

F˜(x)= F (x)+ ǫ F (x) /√n for some small function ǫ : [0, 1] R with ǫ(0) = ǫ(1) = 0. Under this alternate, the random func- → tion ∆n(t) is approximately distributed as a Gaussian process with mean ǫ(t) and with covariance

1/2 γ˜(s,t) = [s + ǫ(s)/√n] [t + ǫ(t)/√n] [s + ǫ(s)/√n][t + ǫ(t)/√n], = γ(s,t)+ O(n− ) ∧ − which will be close to the Brownian Bridge’s γ(s,t) if ǫ is small or n is large. For ǫ , the ∈ H1 Fourier sine coefficients would then be distributed approximately as 1 ζ := ∆ , ψ ind No a , j h n ji0 ∼ j j2π2   where aj := ǫ, ψj 0 are the coefficients of ǫ(t). For large sample size n, the likelihood ratio { h iid i } against H : X F (x) on the basis of the first J coefficients is then: 0 i ∼

2 2 1/2 π j exp (πj)2(ζ a )2/2 2π − j − j Λn = 1/2  π2j2  j J exp n(πj)2(ζ )2/2 Y≤ 2π {− j }   = exp (π2j2) ζ a 1 a 2  j j − 2 | j|  j J X≤   exp ∆ , ǫ 1 |ǫ|2 as J . → h n i1 − 2 1 →∞ n o Thus the Neyman-Pearson likelihood ratio test of H : X F against H : X F˜ would 0 { i} ∼ 1 { i} ∼ reject H0 for large values of 1/2 ∆ , ǫ = n− ǫ′ F (X ) . h n i1 j j n X≤ 

7 3.2 More about Sobolev Spaces Because ψ (t) := √2 sin(nπt) satisfies ψ (t)= (n2π2)ψ (t), the negative second derivative ∆ψ n n′′ − n − of any function ψ with Fourier coefficients b = ψ, ψ is ∈Cc∞ n h ni0 ∆ψ = b [ ∆ψ ] − n − n n N X∈ 2 2 = bn(n π )ψn, n N X∈ so by Eqn (12) the inner product of φ with Fourier sine coefficients a with ψ is H1 ∈Cc∞ { n} φ, ψ = (π2n2)1 a b h i1 n n n N X∈ = φ, ∆ψ h − i0 = φ(s) ψ′′(s) ds; integrating by parts, − Z 1 = φ(s)ψ′(s) + φ′(s) ψ′(s) ds − 0 Z = φ′, ψ′ . (15) h i0 Thus consists precisely of those continuous functions that vanish at the boundary of [0, 1] whose H1 first derivative is square-integrable. Similarly consists of functions whose second derivative is H2 in = L and, for all integers s > 0, consists of s-times differentiable functions. For any H0 2 Hs

φ, ψ 1 = φ, Gψ 0, h i− h i so we now recognize

Gψ(s) := γ(s,t)ψ(t) dt

Z 1 = ( ∆)− ψ(s) (17) − as the solution Gψ = g to the equation ( ∆)g = ψ, and G = ( ∆) 1 is the inverse Laplace − − − operator. More specifically, it is the Dirichlet inverse Laplacian that gives the solution g = Gφ of

8 ∆g = φ with boundary conditions g(0) = g(1) = 0. Adding any linear function g(x)+ a + bx − would give another solution of ∆g = φ with different BC. − The operators ( ∆) and G are continuous isomorphisms of s onto s 2 and s+2, respectively, − H H − H for every s R. The function γ(s,t) is called the Dirichlet “Green’s function” for ( ∆), a solution ∈ − with Dirichlet boundary conditions to the formal equation ∆g(s,t)= δ(s t) for the Dirac delta − − function (more on that below).

3.2.1 Duality

For each s> 0 there is a natural duality between s and s: elements of s may be viewed as H H− H− continuous linear functionals on s, so s = s∗ and s = ∗ s as Banach spaces. The pairing H H− H H H− of a linear functional ξ with coefficients b = ξ[ψ ] with an element φ with coefficients a n n ∈ Hs { n} is given by

φ, ξ = a b h i n n n N X∈ 2 2 s 2 2 s = (n π ) an (n π )− bn n N ∈ X s s = ( ∆) φ, ( ∆)− ξ h − − i0 that satisfies

φ, ξ |φ|s|ξ| s. |h i| ≤ − Of course , a Hilbert space, is also self-dual in the sense that each continuous linear functional Hs ξ is given by ξ[φ]= φ, ψ for some ψ , with coefficients c . The connection is given by h is ∈Hs { n} φ, ψ = (n2π2)sa c = a b = φ, ξ , h is n n n n h i n N n N X∈ X∈ so b = (n2π2)sc and ξ = ( ∆)sψ. n n − 3.2.2 Examples Here we look at some examples of functions and distributions, and see in which spaces they lie. Hs Constants The constant function φ(s) 1, for example, has coefficients ≡

an = φ(t)√2 sin(nπt) dt Z √2 1 = − cos(nπt) nπ 0 √ = 2 2/nπ for odd n, and 0 for even n

9 and so, by Eqn (13), squared norm Hs 8 |φ|2 = (π2n2)s s π2n2 oddXn 2 2 (s 1) = 8 (n π ) − oddXn < if and only if s< 1/2. ∞ The function φ(t) 1 is as smooth as possible, but doesn’t vanish at t = 0 and t = 1 and so it ≡ won’t be in all of the . Every element ψ with s > 1/2 must be continuous and satisfy {Hs} ∈ Hs those boundary conditions, because then

|φ|2 = (n2π2)s a 2 < s | n| ∞ n N X∈ and so, by the Cauchy-Schwartz inequality (see Prop. (1) on p. 26),

s s a = a (nπ) (nπ)− | n| | n| n N n N X∈ X∈   1/2 1/2 2 2 2 s 2 2 s a (n π ) (n π )− ≤ | n| (n N ) (n N ) X∈ X∈ s 1/2 |φ| π− (1 1/2s)− < ≤ s − ∞ for s> 1/2. It follows that

φ(t)= an√2 sin(nπt) n N X∈ converges uniformly to a continuous function φ which, like each sin(nπt), must vanish at t = 0 and t = 1.

10 Splines A spline is a piecewise-polynomial function of some specified degree d, whose first (d 1) − derivatives are continuous at the “knots” (the boundaries of the intervals on which the function has polynomial form). They pop up a lot in Sobolev theory and function approximation. The simplest non-trivial one would be a linear spline like the tent function

h(t)= t (1 t), ∧ − 1 a continuous function that rises linearly from zero to a maximum of h( 2 ) = 1/4, then falls linearly to h(1) = 0. Its Fourier coefficients an = h(t)ψn(t) dt vanish for even n (by symmetry) and for odd n are a = ( 1)(n 1)/2√8/n2π2, so its squared Sobolev norm n − − R 2 2 2 s 2 |h|s = 8 (π n ) − oddX n is finite for s< 3/2— e.g., |h| = |h | = 1. Every linear spline vanishing at the boundary is in 1 ′ 0 Hs for s< 3/2; every quadratic (or cubic) spline vanishing to order two (or three) at the boundary is in for s< 5/2 (or s< 7/2), respectively, and so forth. Hs The derivative h′ of a linear spline h is piecewise-constant and orthogonal to the constant t function 1, to ensure that the boundary condition h(1) = 0 is met for the function h(t)= 0 h′(x) dx. Similarly the dth derivative of a rank-d spline is piecewise-constant and orthogonal to polynomials R of degree d; in fact that is a succinct characterization of all splines on [0, 1] with Dirichlet ≤ boundary conditions.

Dirac Delta Functions For t (0, 1) the evaluation functional ∗ ∈

δt∗ [φ] := φ(t ) (18) ∗ has coefficients

an = φn(t )= δt∗ , ψn 0 = √2 sin(nπt ) ∗ h i ∗ that satisfy a √2 for all n, and hence | n|≤ 2 2 2 s 2 δ ∗ = (π n ) a k t ks | n| n N X∈ 2 (π2n2)s ≤ n N X∈ < for s< 1/2. ∞ − This functional, called the “Dirac delta distribution,” is a well-defined object in for s < 1/2 Hs − but is not a function like the elements of for s 0. By the same reasoning, every finite measure Hs ≥ µ on [0, 1] is in for s< 1/2— for example, Hs − 2 2 µ 1 = µ, µ 1 = µ(ds)γ(s,t)µ(dt) µ([0, 1]) /4 < . k k− h i− ≤ ∞ ZZ

11 Derivatives of the Dirac Delta Function For t (0, 1) the derivative evaluation functional ∗ ∈

δt′∗ [φ] := φ′(t ) ∗ has coefficients √ an = φn′ (t )= δt′∗ , ψn 0 = nπ 2 cos(nπt ) ∗ h i ∗ that satisfy a nπ√2 for all n, and hence | n|≤ 2 2 2 s 2 δ′ = (π n ) a k t∗ ks | n| n N X∈ 2 (π2n2)(s+1) ≤ n N X∈ < for s< 3/2. ∞ − This distribution is in for s< 3/2 but is not even a measure (i.e., is not continuous as a linear Hs − functional on ([0, 1]). In a similar way one can construct derivatives δ(k) in for s< k 1/2 C t∗ Hs − − of all orders k N or, similarly, derivatives of any function φ or finite measure µ. ∈ Sobolev’s Lemma For integers k 0 denote by k the linear space of continuous functions f(t) that vanish to ≥ C0 order k at the boundary t 0, 1 , i.e., that satisfy ∈ { } k k lim t− f(t) = 0 and lim(1 t)− f(t) = 0 t 0 | | t 1 − | | ↓ ↑ For k = 0 the space 0 consists of all the continuous functions that take the value f(0) = f(1) = 0, C0 but to be in k for k > 0 both f(t) and f(1 t) must be o(tk) at t = 0. What is the relation of C0 − these functions, with k continuous derivatives, and , whose kth derivative is in L ? Sobolev’s Hk 2 Lemma gives the answer: Theorem 1 (Sobolev’s Lemma). For each integer k 0 and number s > k + 1/2, the space k ≥ C0 of k-times differentiable functions φ that vanish to order k at the boundary ∂[0, 1] and the Sobolev spaces satisfy k Hs ⊂C0 ⊂Hk with each inclusion continuous. So, every element of for s > 1/2 is a continuous function; every element of for s > 3/2 is Hs Hs continuously differentiable, and so on. 0 Proof. First, to illustrate the idea, here’s a simple proof of a weaker result, that 1 0 0. t H ⊂C ⊂H For f o∞ and t [0, 1], f(t) = 0 f ′(x) dx = f ′, 1 0

12 for every 0

0 Proof. Now, using a little more Sobolev technology, we can show the stronger result that s 0 1 H ⊂C for any s> 2 . First calculate the Fourier coefficients for the function 1 x t : { ≤ } t cn := 1 x t , ψn 0 = √2 sin(nπx) dx h { ≤ } i Z0 = √2[1 cos(nπt)]/nπ so − c 2 8/n2π2. | n| ≤ 1 Now, express f c∞ s for s> and its derivative f ′ s 1 in Fourier form ∈C ⊂H 2 ∈H − f(t)= an√2 sin(nπt) n N X∈ f ′(t)= bn√2 sin(nπt) n N X∈ with coefficients a = f, ψ and b = f , ψ satisfying { n h ni0} { n h ′ ni0} > |f|2 = (π2n2)s a 2 ∞ s | n| n N X∈ 2 2 2 (s 1) 2 = |f ′|(s 1) = (π n ) − bn . − | | n N X∈ For any 0

2 2 f(t) = f ′, 1 x t 0 | | h { ≤ }i 2

= bncn n N X∈ 2 s 1 1 s = b (nπ ) − c (nπ) − n · n n N X∈     2 2 2 s 1 2 2 2 1 s b (n π ) − [ c (n π ) − ≤ | n| | n| (n N ) (n N ) X∈ X∈ 2 2 2 2 2 1 s |f ′|(s 1) (8/n π )(n π ) − ≤ − × n N X∈ 2 2 2 s = 8|f|s (n π )− n N X∈ which is finite for s> 1 , so for some finite c we have sup f(t) c|f| . 2 | |≤ s The extension from 0 to k is left to the reader. C0 C0

13 3.3 Extensions In some sense the Dirichlet Sobolev spaces are built around the positive definite operator {Hs} ( ∆), the Dirichlet Laplacian on L ([0, 1]), and its inverse operator G. Similar spaces and theory − 2 can be built around other positive definite operators on other spaces L ( ). For example, the 2 T negative Laplacian on Rn with free boundary conditions is only semi-definite, but adding a positive constant m2 will make it positive definite and one can build Sobolev spaces

= h : (m2 + x 2)s hˆ(x) 2 dx < Hs | | | | ∞ n Z o where hˆ is the Fourier transform of h; for positive integers s = k these will be functions all of whose k-fold derivatives are in L (Rd) or, for all s, those for which ( ∆+ m2)s/2h L . For d = 1 the 2 − ∈ 2 unit Gaussian process on 1 is the Ornstein-Uhlenbeck (“OU”) velocity process, while in d > 1 H− its a special case of the Mat`ern family. Another interesting choice is the circle S1 (where instead of the Brownian Bridge we would d 1 d construct period Brownian Motion) or the sphere S − in R (there might be applications for this in astronomy or geology or meteorology with d = 3) where again the Laplacian is only semi- definite (since constant functions are harmonic, i.e., satisfy ∆h 0), so again it’s necessary to take ≡ m > 0 for ( ∆+ m2). In mathematics it’s conventional to take m = 1; in quantum field theory − applications, m denotes the mass of a particle (hence the notation). There is also a well-developed theory of L Sobolev space for p = 2, with some applications p 6 in the theory of differential equations and some use in wavelets, but the L2 theory is simpler and more useful.

4 Reproducing Kernel Hilbert Spaces

For fixed 0 s 1 the function ≤ ≤ γ (t) := γ(s,t) = (s t) st s ∧ − is continuous on [0, 1], and so is in with coefficients H1 1 a = γ (t)ψ (t) dt = √2 sin(nπs) n s n n2π2 Z Expanding, we find the representation 2 γ(s,t)= sin(nπs) sin(nπt). (19) n2π2 n N X∈ For any φ with coefficients b , ∈H1 { n} 1 γ , φ = (n2π2)1 √2 sin(nπs)b h s i1 n2π2 n n N X∈ = bnψn(s) n N X∈ = φ(s),

14 i.e., the inner product of γ with any φ is the evaluation functional φ(s)— inner products H1 s ∈H1 with γ “reproduce” elements of . For this reason γ( , ) is called a reproducing kernel on , s H1 · · H1 and itself is called a “reproducing kernel Hilbert space” or “RKHS”. This could be seen more H1 easily (but more abstractly) from Eqns (16, 17) by writing

φ(s)= G( ∆)φ(s) − = γ , ( ∆)φ h s − i0 = γ , φ . h s i1 Definition 2 (RKHS). A Reproducing Kernel Hilbert Space is a Hilbert space of functions on H some set along with a positive-definite function k : C with the property that for each T T×T → s the function k (t) := k(s,t) lies in and satisfies ∈T s H k , h = h(s) h s i for every h and s . ∈H ∈T Not every Hilbert space of functions can be a RKHS— evaluation at a point is a continuous linear operator on any RKHS, but it isn’t on, e.g., = L [0, 1]. In any RKHS the linear span H0 2 H of k is dense in . For any CONS e (s) on a RKHS the kernel can be written in the form { s} H { n }

k(s,t)= en(s)en(t) (20) n N X∈ with the sum converging at every point and also in the Hilbert space norm. In our Dirichlet H⊗H unit interval example one particular CONS in is given by e (t) = (nπ) 1ψ (t), leading to the H1 n − n representation of Eqn (19) for the Brownian Bridge covariance. Every positive-definite function k(s,t) on any set determines a RKHS (k) of functions T H on and also determines a Gaussian stochastic process Z indexed by s with mean zero T s ∈ T and covariance k(s,t). RKHS s are helpful in problems where interpolation is needed: one way to construct a function g : C attaining specified values g(s )= y for finite set (s ,y ) C T → i i { i i }⊂T × is to set

g(s)= βiksi (s) (21a) X where β is the vector solution of the matrix equation { i}

yi = k(si,sj)βj, (21b) Xj and this will be the element g (k) of minimum norm that satisfies g(s )= y . It is also identical ∈H i i to the Bayesian posterior expectation

g(s)= E[Z(s) Z(s )= y , i I] (21c) | i i ∈ for the Gaussian Process (“GP”) model Z GP(0, k) with mean zero and covariance k( , ) (non- ∼ · · zero means are easy to accommodate too).

15 Similarly, for integers k N the Sobolev space is a RKHS for the positive-definite kernel ∈ Hk given recursively by γ 1(s,t) := γ(s,t) = (s t) st and, for k N, ∗ ∧ − ∈ 1 (k+1) k γ∗ (s,t) := γ(s, u)γ∗ (u, t) du. Z0 For example,

2 2 2 s(1 t)(2t s t )/6 st − − − 4 2 3 s(1 t)(3s + 10s ( 2+ t)t + ( 2+ t)t( 4 + 3( 2+ t)t))/360 st − − − − − The mean-zero Gaussian Process Z : s [0, 1] with covariance kernel γ k(s,t) has the property { s ∈ } ∗ that, for any finite sets of times t : i I and values z : i I , the function { i ∈ } { i ∈ } g(s) := E[Z : Z = z , i I] s ti i ∈ is the element in of minimal norm achieving the specified values, and is given by an order- Hk k spline that vanishes at 0 and 1. Evidently there are deep connections between splines, GPs, and RKHSs. See Wahba (1990, 2003) for more about this, or just Google RKHS to find lots of interesting references.

5 Mercer Kernels and Karhunen-Lo`eve Expansions

Mercer’s Theorem (and a few variations) give conditions to guarantee that a covariance function will have a representation like Eqn (20); Karhunen and Lo`eve use that to find a representation for Gaussian processes similar to that of Eqn (14a).

Theorem 2 (Mercer). Let be a compact Hausdorff space, let µ : ( ) R be a Borel measure T B T → + on , and let k : R be a symmetric positive-definite continuous function on . Then T T×T → T there exists an orthonormal set e L ( , dµ) of continuous functions on and a summable { n} ⊂ 2 T T sequence λ R of positive numbers for which { n}⊂ +

k(s,t)= λnen(s)en(t) (22) n N X∈ for every s,t . The sum converges uniformly and absolutely. ∈T The functions e ( ) are in fact the eigenfunctions for the integral operator K : φ K[φ] L n · ∈ 2 given by K[φ](s)= k(s,t) φ(t) µ(dt) ZT for s , φ L , with positive eigenvalues λ whose sum is ∈T ∈ 2 { n} λ = k(t,t) µ(dt) < n ∞ X ZT 16 In the Brownian Bridge example above we had the compact unit interval = [0, 1] with T Lebesgue measure µ(dx)= dx and continuous kernel k(s,t)= γ(s,t)= s t st, but here we see ∧ − many of the same ideas would work in more general settings. Further generalizations are possible— for example, k need not be continuous as long as

k(s,s) µ(ds) < and k2(s,t) µ(ds)µ(dt) < , ∞ ∞ ZT ZZT ×T but then the eigenfunctions en(s) may not be continuous, and the convergence of Eqn (22) may not be uniform.

Theorem 3 (Karhunen-Lo`eve). Let Xt be a square-integrable mean-zero stochastic process indexed by a compact Hausdorff space , let µ : ( ) R be a Borel measure on , and suppose that T B T → + T the covariance function k(s,t) = EXsXt is jointly continuous. Then there exists an orthonormal set e L ( , dµ) of continuous functions on , a summable sequence λ R of positive { n} ⊂ 2 T T { n} ⊂ + numbers, and a collection Z of uncorrelated random variables with mean zero and one { n} such that

Xt = λnZnen(s). n N X∈ p The Z are given by { n}

Zn = Xten(t) µ(dt), ZT i.e., the inner product Z = X, e in L ( , dµ). The sum converges uniformly and absolutely. In n h ni 2 T case the process X is Gaussian, the Z will be iid No(0, 1). t { n} 6 Tightness & Compactness

6.1 The Kolmogorov Continuity Theorem Definition 3 (H¨older continuity). A function f : S Rd is H¨older continuous with index α > 0 → on the set S R if ⊂ ( C < )( s,t S) f(s) f(t) C s t α. ∃ ∞ ∀ ∈ k − k≤ | − | A function with bounded continuous first derivative will be H¨older continuous with index α = 1, and one whose first k derivatives are continuous and bounded will be H¨older continuous of index α = k; this generalizes that idea of smoothness to fractional degrees. Any H¨older continuous function is, in particular, continuous.

Definition 4 (Version). A stochastic process X˜t t is called a version or a modification of a { } ∈T process Xt t if, for every t , P[X˜t = Xt] = 1. { } ∈T ∈T For countable index sets this is the same as P[( t )X˜ = X ] but not for uncountable T ∀ ∈ T t t index sets (give an example!).

17 Theorem 4 (Kolmogorov Continuity Theorem). Let α,ǫ,c > 0. If a d-dimensional stochastic process X : [0, 1] L (Ω, , P) satisfies the bound t → α F E X X α c t s 1+ǫ k s − tk ≤ | − | uniformly for all s,t [0, 1], then there exists a version X˜ of X whose sample paths are H¨older ∈ t t continuous of index γ for every 0 < γ < ǫ/α. Proof. Take d = 1 (it’s easy to extend to d> 1). Denote by Q := j/2n : 0 j 2n n { ≤ ≤ } the dyadic rationals of order n and by Q := Q their union, the dense set of all dyadic rationals ∪ n in the unit interval. For 0 < γ < ǫ/α Markov’s inequality (Lemma 2 on p. 27) gives

γn P max Xj/2n X(j 1)/2n > 2− 1 j 2n − − ≤ ≤ h i

P γn = Xj/2n X(j 1)/2N > 2−  − −  1 j 2n ≤[≤ 2n  P γn Xj/2n X(j 1)/2N > 2− ≤ − − j=1 X n o 2n E α αγn Xj/2n X(j 1)/2N 2− ≤ − − j=1 X . n n(1+ǫ) αγn 2 c2− 2− ≤ n(ǫ αγ) . = c2− − . Since this is summable in n, the Borel-Cantelli Lemma (lemma 1 on p. 27) asserts that outside some null set there exists a random N < such that for all n N N ∞ ≥ γn max Xj/2n X(j 1)/2n 2− 1 j 2n − − ≤ ≤ ≤

By looking at the maxiumum over the finitely-many n < N, we can find a finite random number C < such that ∞ γn max Xj/2n X(j 1)/2n C 2− 1 j 2n − − ≤ ≤ ≤ for every n N. Now let’s show that, outside the null set , the paths are γ-H¨older continuous on ∈ N Q. For any s,t Q let n = ( log s t ) (where x denotes the least integer x), so ∈ ⌈ − 2 | − | ⌉ ⌈ ⌉ ≥ n 1 n 2− s t < 2 − . ≤ | − | The sequence sk := max q Qk : q s for k n increases monotonically to s and satisfies k { ∈ ≤ } ≥ k 0 (sk sk 1) 1/2 ; indeed each such term is exactly zero or 1/2 , and only finitely-many are ≤ − − ≤ nonzero because s Q. Similarly set t := max q Q : q t for k n. Then ∈ k { ∈ k ≤ } ≥ X X = (X X )+ (X X ) + (X X ) t − s si − si−1 ti − ti−1 sn − tn Xi>n Xi>n 18 where only finitely-many terms in each sum are non-zero. Also, s t s t + 2 n 3 2 n. | n − n| ≤ | − | − ≤ · − Summing,

X X X X + X X + X X | t − s|≤ | si − si−1 | | ti − ti−1 | | sn − tn | Xi>n Xi>n γj γ γn 2 C 2− + 3 C 2− ≤ Xj>n 2 γ γn 2 γ γ = C + 3 2− C + 3 t s . 2γ 1 ≤ 2γ 1 | − |  −   −  Now define the modification X˜ to be zero on , and elsewhere t N

X˜t := lim Xq : qn = max(Qn [0,t]) , n n →∞{ ∩ } well-defined and continuous since Xt is uniformly continuous on Q.

For α = 2 the Brownian Bridge satisfies

E B B 2 = s t 1 s t s t | s − t| | − | − | − | ≤ | − | which is bounded by s t 1 but not by c s t 1+ǫ for any ǫ> 0. For α = 4 we do better, though, | − | | − | with E B B 4 3 s t 2 | s − t| ≤ | − | proving that some version of Bs has H¨older-continuous sample paths of any index γ < 1/4. A similar argument shows for even integers α 2 that ≥ α + 1 E B B α = √π2α/2Γ t s α/2[1 t s ]α/2 | s − t| 2 | − | − | − | c t s α/2,  ≤ | − | so the paths are H¨older continuous of any index

α/2 1 1 1 γ < − = α 2 − α or, taking the supremum as α , of any index γ < 1 . It turns out that this bound is sharp— → ∞ 2 paths are not continuous with index γ 1 and, in particular, are not differentiable. ≥ 2 6.2 Compactness in Sobolev Spaces

The normalized difference ∆n between the empirical and actual CDFs for n iid uniforms of Eqn (2) may be written

1 ∆n = n− 2 1 U t t { i≤ } − i n X≤ 1  = n− 2 φi(t) i n X≤

19 where

t t < Ui φi(t)= 1 Ui t t = − { ≤ } − (1 t t > Ui  − has Fourier sine coefficients

φ , ψ = φ (t)√2 sin(nπt) dt h i ni0 i Z = √2 cos(nπUi)/nπ and hence squared Sobolev norm

2 φ 2 = (π2n2)s cos2(nπU ) k iks π2n2 i n N X∈ with expectation

2 2 2 s 1 E φ = (π n ) − . k iks n N X∈ This is finite for s< 1 , so also ∆ for s< 1 . By independence, for s< 1 2 k ∈Hs 2 2 1 1 E ∆ 2 = E φ 2 + E φ , φ k kks k k iks k h i jis i k i=j X≤ X6 2 2 s 1 = (π n ) − n N X∈ 1 uniformly in k. By Markov’s inequality (Lemma 2, on p. 27), for any s< 2 and ǫ> 0 we can find r< such that for all n ∞ P |∆ | r 1 ǫ, n s ≤ ≥ − i.e., such that the ball B (r) of radius r in has probability exceeding (1 ǫ) for the distribution s Hs − measure µn of each ∆n. Although Bs(r) isn’t compact in s, it is compact in s 1— hence µn 1 H H − { } is tight as a family of measures on any r for r < (for example, 1). This sequence of H − 2 H− probability measures converges to the distribution µ for the Brownian Bridge, and the distribution of any continuous functional of ∆n converges as well. I’d like to build that into a Sobolev-based argument that the distribution of

√nDn = sup ∆n(t) 0 t 1 | | ≤ ≤ converges to that of sup0 t 1 Bt , but it doesn’t seem to quite work because “sup” is only continuous ≤ ≤ | | as a functional on for s> 1 (try to prove that) but we’ve only shown ∆ for s< 1 . Close, Hs 2 n ∈Hs 2 but not quite good enough. Kolmogorov’s criterion will still get tightness on ([0, 1]) and sup is C continuous on that, proving that √nDn sup0 t 1 Bt , but I’d still like to get a Sobolev argument ⇒ ≤ ≤ | | to work.

20 7 Reflecting on Brownian Bridge Extremes

We motivated a study of Dirichlet Sobolev spaces in Section (1) by looking at Donsker’s Theorem and the Kolmogorov-Smirnov statistic. In this brief section we derive a closed-form representation of the CDF for the statistic Dn of Eqn (1), based on representation Eqn (3c) and a reflection argument. The approach relies on a feature of Brownian Motion (the “strong ”) that we will assert but not prove; ask me for details or references if you would like to look at this more carefully. First let W (t) be a Wiener process or Brownian Motion, and fix T > 0 and u > 0. Set τ := inf t : W (t) u , the first time W (t) exceeds u. Then u { ≥ }

P[ sup W (t) u]= P[τu T ] 0 t T ≥ ≤ ≤ ≤ Conditionally on the event [τ T ], we have W (τ )= u by path continuity and so, by the strong u ≤ u Markov property, the Brownian path on the (random) time interval [τu, T ] is equally likely to increase to W (T ) > u or decrease to W (T ) < u. Thus

P[ sup W (t) u] = 2P[τu T W (T ) > u] 0 t T ≥ ≤ ∩ ≤ ≤ = 2P[W (T ) > u] = 2Φ( u/√T ), − since necessarily τ < T if W (T ) > u and since W (T ) No(0, T ). This argument is called the u ∼ “reflection principle,” since it relies on the identical distributions of [W (t) u] and its “reflection” − [u W (t)] on the time interval [τ , T ]. A conditional version of the same argument will help us − u with the Brownian Bridge.

7.1 Distribution of the First Hitting Time The collection of random variables τ themselves make up a stochastic process, indexed by u 0, { u} ≥ with interesting properties. From the argument above, τu has CDF

1/2 Fu(t)= P[τu t]=2Φ ut− 1 t>0 ≤ − { }  and hence pdf

1/2 1 3/2 fu(t) = 2φ ut− u 2 t− 1 t>0 − { } u 3/2 u2/2t = t− e− 1 t>0, √2π { } rising from zero at t = 0 to a peak of u 2 27/2πe3]1/2 at t = u2/3, then falling at rate f (t) t 3/2 − u ∝ − as t . Here’s a plot for u = 1: →∞ 

21 0.5 ...... 0.4 ...... 0.3 ...... 0.2 ...... 0.1 ...... 0.0 ...... 0.0 1.0 2.0 3.0 4.0 5.0

Because this decreases more slowly than t 2, τ has infinite mean Eτ = . This distribution is − u u ∞ often called the inverse Gaussian, or sometimes the L´evy distribution, denoted iG(u) or Lv(u); it 1 is a special case of the α-Stable family, the fully-skewed α-Stable with α = 2 . That can be verified by computing its characteristic function:

iωτu χu(ω)= E e

u ∞ 3/2 iωt u2/2t =   t− e − dt √2π Z0 u ∞ 3/2 3 iωu2z 1/2z 2 = z− u− e − u dz √2π 0 Z2 = χ1(ωu ), where

1 ∞ 3/2 iωz 1/2z χ1(ω)= z− e − dz √2π Z0 does not depend on u. But also the sum of independent random variables τ iG(u) and τ iG(v) u ∼ v ∼ has distribution τ + τ iG(u + v), because W can only reach level u + v by first reaching level u v ∼ t u and then rising an additional height v, so

2 2 2 χu+v(ω)= χ1 ω(u + v) = χu(ω)χv(ω)= χ1(ωu )χ1(ωv );  all solutions of this are of the form

γ ω 1/2 χ1(ω)= e− | | for some γ 0. Evaluating the integral above shows that γ = 1, so the inverse Gaussian has ch.f. ≥ u ω 1/2 χu(ω)= e− | | , the ch.f. of the fully-skewed α-Stable distribution: iG(u)= StA(α = 1/2, β = 1, γ = u, δ = 0).

7.2 Conditional Reflection Now let x R and fix a small ǫ> 0, and consider the conditional probability ∈

P[ sup W (t) u W (T ) x <ǫ ]= P[τu T W (T ) x <ǫ ] 0 t T ≥ | {| − | } ≤ | {| − | } ≤ ≤

22 This probability is one if x > u + ǫ; for x < u, and ǫ< (u x), − P[τ T W (T ) x <ǫ] = u ≤ ∩ | − | P[ W (T ) x <ǫ] | − | P[τ T W (T ) 2u + x <ǫ] = u ≤ ∩ | − | P[ W (T ) x <ǫ] | − | because [W (t) u] and [u W (t)] have the same distribution on [τ , T ] − − u P[ W (T ) 2u + x <ǫ] = | − | P[ W (T ) x <ǫ] | − | because τ < T if W (T ) > 2u x ǫ > u since ǫ< (u x) u − − − 2u x+ǫ 2u x ǫ Φ − Φ − − √T − √T = x+ǫ x ǫ Φ  Φ −  √T − √T 2u x φ −   √T as ǫ 0 → φ x → √T  1 1  (2u x)2/2T 1 x2/2T − = e− − e− √ √  2πT   2πT  (2u x)2 + x2 = exp − − 2T   2u(u x) = exp − − T   Taking T = 1 and x = 0, we find for the Brownian Bridge that

2u2 P[ sup Bt > u]= e− 0 t 1 ≤ ≤ and so

2u2 P[ sup Bt > u] 2e− , 0 t 1 | | ≈ ≤ ≤ a strict upper bound with very small error for u 0. To do better, we can use the reflection ≫ principle multiple times to try to find (with some informality about the notation)

P[ sup Bt > u]= P[τu τ u 1 W (1) 0+ dx]+ P[τ u τu 1 W (1) 0+ dx] 0 t 1 | | ≤ − ∧ | ∈ − ≤ ∧ | ∈ ≤ ≤ = 2P[τu τ u 1 W (1) 0+ dx] ≤ − ∧ | ∈ = 2P[τu τ u 1 W (1) 0+ dx]/P[W (1) 0+ dx] ≤ − ∧ ∩ ∈ ∈ where

P[W (1) 0+ dx]= φ(0) dx ∈

23 and, by reflection (draw picture),

P[τu τ u 1 W (1) 0+ dx]= P[W (1) 2u + dx] P[W (1) 4u + dx]+ . . . ≤ − ∧ ∩ ∈ ∈ − ∈ = φ(2u) dx φ(4u) dx + ..., − and so

φ(2u) φ(4u)+ ... P[ sup Bt > u] = 2 − 0 t 1 | | φ(0) ≤ ≤ 2u2 8u2 = 2 e− e− + − · · · ∞ n 1 2n2u2  = 2 ( 1) − e− . − nX=1 In a bit more detail,

j 1 A := 0

P[ sup Bt u] = 2P[A1 τu <τ u] 0 t 1 | |≥ ∩ − ≤ ≤ and by symmetry, for all n N, ∈

P[An τu <τ u]= P[An] P[An τ u <τu] ∩ − − ∩ − = P[An] P[An+1 τu <τ u]. − ∩ − By induction, for each n N,1 ∈ n 1 P[A1 τu <τ u]= P[A1] P[A2]+ P[A3] ... + ( 1) − P[An τu <τu 1] ∩ − − − − ∩ − n 1 = ( 1) − P[A ] − n n 1 X≥ in the limit as n since P[A ] 0. By reflecting each time W (t) hits u, →∞ n → ± P [ W1 2nu <ǫ] 2n2u2 P[An] = lim | − | = e− . ǫ 0 P[ W <ǫ] → | 1|

1Note: The sum converges by Leibniz’s test because successive terms decrease in absolute value and alternate in 2 2 −2n u sign. It also converges absolutely, because P[An]= e decreases faster than geometrically.

24 A Appendix

Here we collect definitions and a few properties of some of the kinds of spaces needed above, and some standard results from real and functional analysis. For more details consult any standard text on functional analysis, such as (Reed and Simon, 1980) or (Rudin, 1973). Classic sources for more details about distributions and test functions are (Gel’fand and Vilenkin, 1964) and (Schwartz, 1966).

Definition 5 (Hilbert Space). A Hilbert Space is a complete separable normed linear vector space over either the real numbers R or the complex numbers whose norm is given by x = x,x 1/2 H C k k h i for an inner product , : C that satisfies for x,y and a, b : h· ·i H×H→ ∈H ∈C y,x = x,y h i h i x,x 0, with x,x = 0 x = 0 h i≥ h i ⇔ ax + bx ,y = a x ,y + b x ,y h 1 2 i h 1 i h 2 i The inner product may be recovered from the norm from the “polarization identity”

4 x,y = x + y 2 x y 2 h i k k − k − k for real Hilbert spaces or, for complex ones,

= x + y 2 x y 2 + i x + iy 2 i x iy 2. k k − k − k k k − k − k This will be a valid inner product (and hence the normed linear space will actually be a Hilbert space) if and only if the norm obeys the (Pythagorean!) parallelogram law,

x + y 2 + x y 2 = 2 x 2 + y 2 k k k − k k k k k n n  (prove this!) Familiar examples include R and C , any L2 space, and sequence spaces

ℓ = a : a 2 < 2 | n| ∞ n X o where the sum may go over any countable set like N or Z. Every Hilbert space admits a complete orthonormal system (CONS), a countable collection H x with the properties { n} x ,x =0 if m = n, and 1 if n = m; • h m ni 6 ( n) x,x = 0 if and only if x = 0; • ∀ h ni Each x admits a unique convergent expansion x = a x with coefficients a = x,x ; • ∈H n n n h ni x,y = a ¯b if a and b are the coefficients corresponding to x and y, respectively. • h i n n { n} { n} P The basis x Pisn’t uniquely determined, just as there are many different possible bases for Rn or { n} Cn. In fact any linear independent collection y can be turned into an orthonormal system by { n} the Gram-Schmidt process, and extended if necessary to form a CONS.

25 Definition 6 (Banach Space). A Banach Space is a complete separable normed linear vector space over either the real numbers R or the complex numbers C whose norm satisfies the conditions B x 0, with x = 0 x = 0 k k≥ k k ⇔ ax = a x k k | | k k x + y x + y k k ≤ k k k k

Every Hilbert space is also a Banach space. Additional examples include the continuous func- tions on any compact set (K) with f := supx K f(x) and any Lp space with 1 p . If C k k ∈ | | ≤ ≤∞ B is a Banach space then a “continuous linear functional” on is a linear mapping B ℓ : C B → that is “continuous” in the sense that there exists some constant c< for which ∞ ℓ(x) c x | |≤ k k for every x . The smallest such constant c is the norm of ℓ, ∈B ℓ := inf c< : ( x ) ℓ(x) c x k k { ∞ ∀ ∈B | |≤ k k} = sup ℓ(x) / x . 0=x | | k k 6 ∈B The space of all such continuous linear functionals is itself a Banach space with this norm, called the “dual space” to and denoted by . Every Hilbert space is “self-dual” in that = , with B B∗ H∗ H each continuous linear functional given by the inner-product with some element of : ℓ(x)= x,y H h ℓi for some y (a result called the Hilbert-space “Reisz Representation Theorem” or the “Fr´echet- ℓ ∈H Riesz Theorem”). The dual space of the continuous functions = (K) on a compact set K is B C the space of finite Borel measures on K, with linear functionals ℓ(f) = K f(x)µℓ(dx) (a result also called the “Riesz representation theorem,” or the “Riesz-Markov-Kakutani representation the- R orem”). The dual space of any Lp space is Lp∗ = Lq, with conjugate exponent q = p/(p 1) (so 1 1 − + = 1); the spaces L1 and L are mutually dual as well. p q ∞ Proposition 1 (Cauchy Schwarz). Let be a real Hilbert space with inner product x,y and norm H h i x = x,x 1/2. Then for all x,y , k k h i ∈H x,y x y . |h i| ≤ k k k k Proof. Write x = x x and y = y y for x := x/ x and y := y/ y , each of norm one. k k ∗ k k ∗ ∗ k k ∗ k k From the positivity of

2 2 2 x∗ + y∗ = x∗ + 2 x∗,y∗ + y∗ = 2 1+ x∗,y∗ and k k k k h i k k h i 2 2 2 x∗ y∗ = x∗ 2 x∗,y∗ + y∗ = 2h1 x∗,y∗ i k − k k k − h i k k − h i it follows that x ,y 1 and hence that   |h ∗ ∗i| ≤ x,y = x y x∗,y∗ x y . |h i| k k k k |h i| ≤ k k k k

26 Definition 7 (Some properties). A metric space is complete if every Cauchy sequence converges— i.e., if x has the • X { n} property that ( ǫ> 0)( N < )( n,m N) d(x ,x ) <ǫ ∀ ∃ ∞ ∀ ≥ n m then there exists x such that x x. The spaces Rn and n are complete; the rationals ∈X n → C Q are not. Every function in = L ([0, 1], dx) is also in L ([0, 1], dx), so with distance X 2 1 X d (x,y) = x y is a metric space, but it is not complete in that metric (although it is 1 k − k1 complete in the metric d (x,y)= x y ). 2 k − k2 A metric space is separable if it contains a countable dense set x , i.e., one for which • X { n} ( x )( ǫ> 0)( n N) d(x,x ) <ǫ ∀ ∈X ∀ ∃ ∈ ∋ n The spaces L are separable for p< but not for p = . p ∞ ∞ Lemma 1 (Borel-Cantelli). Let A be events whose probabilities are summable, { n} P(A ) < . n ∞ n N X∈ Then with probability one only finitely-many A occur, i.e., { n}

P Am = 0 n N m n h \∈ [≥ i Proof. Denote by A := n N m n Am the event that infinitely-many An occur. By subadditivity, ∩ ∈ ∪ ≥ for any finite n N, ∈ P[A] P A P[A ] 0 ≤ m ≤ m → m n m n h [≥ i X≥ P as n , since n N [An] < . →∞ ∈ ∞ P Lemma 2 (Markov’s Inequality). Let Y be a random variable and a> 0 a positive constant. Then EY P[Y > a] + ≤ a where Y := 0 Y is the maximum of zero and Y . + ∨ Proof. Let Z := a1 Y a . Then Z Y+, since either Y a in which case Z = a Y+ or Y < a { ≥ } ≤ ≥ ≤ in which case Z = 0 Y . Hence ≤ + aP[Y a]= EZ EY . ≥ ≤ + Dividing by a completes the proof. Note this implies that Eφ(X) P[X a] ≥ ≤ φ(a) for any random variable X, real number a, and nonnegative function φ that is monotone non- decreasing on the interval [a, ). For example, P[X > a] (σ2 + µ2)/a2 for any a> 0. ∞ ≤

27 References

Donsker, M. D. (1952), “Justification and extension of Doob’s heuristic approach to the Kolmogorov-Smirnov theorems,” Annals of Mathematical , 23, 277–281, doi:10.1214/ aoms/1177729445.

Doob, J. L. (1949), “Heuristic approach to the Kolmogorov-Smirnov theorems,” Annals of Mathe- matical Statistics, 20, 393–403, doi:10.1214/aoms/1177729991.

Gel’fand, I. M. and Vilenkin, N. Y. (1964), Generalized Functions: Applications of Harmonic Analysis, volume 4, New York, NY: Academic Press, translated from 1961 Russian version by Amiel Feinstein.

Kolmogorov, A. N. (1933), “Sulla determinazione empirica di una legge di distribuzione,” G. Ist. Ital. Attuari, 4, 83–91.

Reed, M. C. and Simon, B. (1980), Methods of Modern Mathematical Physics, volume I: Functional Analysis, New York, NY: Academic Press, second edition.

Rudin, W. (1973), Functional Analysis, New York, NY: McGraw-Hill.

Schwartz, L. (1966), Th´eorie des distributions, Paris, FR: Hermann.

Smirnov, V. I. (1939), “On the estimation of the discrepancy between empirical curves of distribu- tion for two independent samples (in Russian),” Byull. Moskov. Gos. Univ. Ser. A, 2, 3–16.

Wahba, G. (1990), Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, volume 59, Berlin, DE: SIAM.

Wahba, G. (2003), “An Introduction to Reproducing Kernel Hilbert Spaces and Why They are So Useful,” SSID 2003 Conference Presentation. On-line at http://www.stat.wisc.edu/~wahba/ talks1/rotter.03/sysid1.pdf.

Last edited: June 22, 2017

28