Function Spaces 1 Hilbert Spaces
Total Page:16
File Type:pdf, Size:1020Kb
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assumed structure is exploited by algorithms and theoretical analysis. Here we review some basic facts about function spaces. As motivation, consider nonparametric regression. We observe (X1;Y1);:::; (Xn;Yn) and we want to estimate m(x) = E(Y jX = x). We cannot simply choose m to minimize the P 2 training error i(Yi − m(Xi)) as this will lead to interpolating the data. One approach is P 2 to minimize i(Yi − m(Xi)) while restricting m to be in a well behaved function space. 1 Hilbert Spaces Let V be a vector space. A norm is a mapping k · k : V ! [0; 1) that satisfies 1. kx + yk ≤ kxk + kyk. 2. kaxk = akxk for all a 2 R. 3. kxk = 0 implies that x = 0. k pP 2 An example of a norm on V = R is the Euclidean norm kxk = i xi . A sequence x1; x2;::: in a normed space is a Cauchy sequence if kxm − xnk ! 0 as m; n ! 1. The space is complete if every Cauchy sequence converges to a limit. A complete, normed space is called a Banach space. An inner product is a mapping h·; ·i : V × V ! R that satisfies, for all x; y; z 2 V and a 2 R: 1. hx; xi ≥ 0 and hx; xi = 0 if and only if x = 0 2. hx; y + zi = hx; yi + hx; zi 3. hx; ayi = ahx; yi 4. hx; yi = hy; xi k P An example of an inner product on V = R is hx; yi = i xiyi. Two vectors x and y are orthogonal if hx; yi = 0. An inner product defines a norm kvk = phv; vi. We then have the Cauchy-Schwartz inequality jhx; yij ≤ kxk kyk: (1) 1 A Hilbert space is a complete, inner product space. Every Hilbert space is a Banach space but the reverse is not true in general. In a Hilbert space, we write fn ! f to mean that jjfn − fjj ! 0 as n ! 1. Note that jjfn − fjj ! 0 does NOT imply that fn(x) ! f(x). For this to be true, we need the space to be a reproducing kernel Hilbert space which we discuss later. If V is a Hilbert space and L is a closed subspace then for any v 2 V there is a unique y 2 L, called the projection of v onto L, which minimizes kv − zk over z 2 L. The set of elements orthogonal to every z 2 L is denoted by L?. Every v 2 V can be written uniquely as v = w + z where z is the projection of v onto L and w 2 L?. In general, if L and M are subspaces such that every ` 2 L is orthogonal to every m 2 M then we define the orthogonal sum (or direct sum) as L ⊕ M = f` + m : ` 2 L; m 2 Mg: (2) A set of vectors fet; t 2 T g is orthonormal if hes; eti = 0 when s 6= t and ketk = 1 for all t 2 T . If fet; t 2 T g are orthonormal, and the only vector orthogonal to each et is the zero vector, then fet; t 2 T g is called an orthonormal basis. Every Hilbert space has an orthonormal basis. A Hilbert space is separable if there exists a countable orthonormal basis. Theorem 1 Let V be a separable Hilbert space with countable orthonormal basis fe1; e2;:::g. P1 2 Then, for any x 2 V , we have x = j=1 θjej where θj = hx; eji. Furthermore, kxk = P1 2 j=1 θj , which is known as Parseval's identity. The coefficients θj = hx; eji are called Fourier coefficients. d P The set R with inner product hv; wi = j vjwj is a Hilbert space. Another example of a R b 2 Hilbert space is the set of functions f :[a; b] ! R such that f (x)dx < 1 with inner R a product f(x)g(x)dx. This space is denoted by L2(a; b). 2 Lp Spaces Let F be a collection of functions taking [a; b] into R. The Lp norm on F is defined by 1=p Z b ! p kfkp = jf(x)j dx (3) a where 0 < p < 1. For p = 1 we define kfk1 = sup jf(x)j: (4) x 2 Sometimes we write kfk2 simply as kfk. The space Lp(a; b) is defined as follows: ( ) Lp(a; b) = f :[a; b] ! R : kfkp < 1 : (5) Every Lp is a Banach space. Some useful inequalities are: Cauchy-Schwartz R f(x)g(x)dx2 ≤ R f 2(x)dx R g2(x)dx Minkowski kf + gkp ≤ kfkp + kgkp where p > 1 H¨older kfgk1 ≤ kfkpkgkq where (1=p) + (1=q) = 1. Special Properties of L2. As we mentioned earlier, the space L2(a; b) is a Hilbert space. R b The inner product between two functions f and g in L2(a; b) is a f(x)g(x)dx and the norm 2 R b 2 of f is kfk = a f (x) dx. With this inner product, L2(a; b) is a separable Hilbert space. Thus we can find a countable orthonormal basis φ1; φ2;:::; that is, kφjk = 1 for all j, R b a φi(x) φj(x)dx = 0 for i 6= j and the only function that is orthogonal to each φj is the zero function. (In fact, there are many such bases.) It follows that if f 2 L2(a; b) then 1 X f(x) = θjφj(x) (6) j=1 where Z b θj = f(x) φj(x) dx (7) a are the coefficients. Also, recall Parseval's identity Z b 1 2 X 2 f (x)dx = θj : (8) a j=1 The set of functions ( n ) X ajφj(x): a1; : : : ; an 2 R (9) j=1 P1 is the called the span of fφ1; : : : ; φng. The projection of f = j=1 θjφj(x) onto the span of Pn fφ1; : : : ; φng is fn = j=1 θjφj(x). We call fn the n-term linear approximation of f. Let P1 Λn denote all functions of the form g = j=1 ajφj(x) such that at most n of the aj's are non-zero. Note that Λn is not a linear space, since if g1; g2 2 Λn it does not follow that g1 +g2 is in Λ . The best approximation to f in Λ is f = P θ φ (x) where A are the n n n n j2An j j n indices corresponding to the n largest jθjj's. We call fn the n-term nonlinear approximation of f. 3 The Fourier basis on [0; 1] is defined by setting φ1(x) = 1 and 1 1 φ2j(x) = p cos(2jπx); φ2j+1(x) = p sin(2jπx); j = 1; 2;::: (10) 2 2 The cosine basis on [0; 1] is defined by p φ0(x) = 1; φj(x) = 2 cos(2πjx); j = 1; 2;:::: (11) The Legendre basis on (−1; 1) is defined by 1 1 P (x) = 1;P (x) = x; P (x) = (3x2 − 1);P (x) = (5x3 − 3x);::: (12) 0 1 2 2 3 2 These polynomials are defined by the relation 1 dn P (x) = (x2 − 1)n: (13) n 2nn! dxn The Legendre polynomials are orthogonal but not orthonormal, since Z 1 2 2 Pn (x)dx = : (14) −1 2n + 1 p However, we can define modified Legendre polynomials Qn(x) = (2n + 1)=2 Pn(x) which then form an orthonormal basis for L2(−1; 1). The Haar basis on [0,1] consists of functions ( ) j φ(x); jk(x): j = 0; 1; : : : ; k = 0; 1;:::; 2 − 1 (15) where 1 if 0 ≤ x < 1 φ(x) = (16) 0 otherwise; j=2 j jk(x) = 2 (2 x − k) and 1 −1 if 0 ≤ x ≤ 2 (x) = 1 (17) 1 if 2 < x ≤ 1: This is a doubly indexed set of functions so when f is expanded in this basis we write 1 2j −1 X X f(x) = αφ(x) + βjk jk(x) (18) j=1 k=1 R 1 R 1 where α = 0 f(x) φ(x) dx and βjk = 0 f(x) jk(x) dx. The Haar basis is an example of a wavelet basis. 4 Let [a; b]d = [a; b] × · · · × [a; b] be the d-dimensional cube and define ( Z ) d d 2 L2 [a; b] = f :[a; b] ! R : f (x1; : : : ; xd) dx1 : : : dxd < 1 : (19) [a;b]d Suppose that B = fφ1; φ2;:::g is an orthonormal basis for L2([a; b]). Then the set of functions ( ) d B = B ⊗ · · · ⊗ B = φi1 (x1) φi2 (x2) ··· φid (xd): i1; i2; : : : ; id 2 f1; 2;:::; g ; (20) d is called the tensor product of B, and forms an orthonormal basis for L2([a; b] ). 3 H¨olderSpaces Let β be a positive integer.1 Let T ⊂ R. The Holder space H(β; L) is the set of functions g : T ! R such that jg(β−1)(y) − g(β−1)(x)j ≤ Ljx − yj; for all x; y 2 T: (21) The special case β = 1 is sometimes called the Lipschitz space. If β = 2 then we have jg0(x) − g0(y)j ≤ L jx − yj; for all x; y: Roughly speaking, this means that the functions have bounded second derivatives.