1 Introduction 2 Gaussian Random Elements
Total Page:16
File Type:pdf, Size:1020Kb
Gaussian Process in Statistical Modeling STA 941. Surya Tokdar 1 Introduction Gaussian distributions are important building blocks in statistical modeling. As mod- eling tools, they arise naturally from first principles reasonings based off the central limit theorem. They also provide the most natural way to incorporate dependency between multiple quantities. When there are infinitely many such quantities that can be viewed as a map or a function from an input space to an output space, the notion of Gaussianity continues to provide useful tools for constructing joint probability mod- els. In particular, there is a notion of a Gaussian probability distribution on infinite dimensional spaces such as RS, the space of all functions f : S ! R, where S is a subset of an Euclidean space. Below we build up a general framework of the notion of Gaussianity on arbitrary spaces. 2 Gaussian random elements As a starting point we only assume the concept of a univariate normality. A random variable X 2 R is said to be normally distributedp if there exist µ 2 R, σ > 0 such that the pdf of X is f(x) = exp{−(x − µ)2=(2σ2)g= 2πσ2, x 2 R. The distribution of X is N(µ, σ2). For completion, we allow the notation N(0; 0) to mean the degenerate distribution under which X = 0 with probability 1. 2.1 Formal definition Below we define the concept of a Gaussian random element in a Banach space F, a (complete) normed linear space with a norm k · k. Two specific examples of interest to us are •F = Rp for some p ≥ 1 equipped with the Euclidean norm •F = C(S), the space of all continuous functions from a compact S ⊂ Rd to R, equipped with the supremum norm kfk1 = sups2S jf(s)j. Notice that the first space is also a Hilbert space while the second is not. Additionally, we would occasionally look at d •F = L2(S; ν) for some S ⊂ R and a measure ν on S, equipped with the L2 norm R 2 1=2 kfk2 = f S f(s) ν(ds)g . For our discussion this space will mostly serve as a caveat against blind generalizations. Recall that linear operators on a Banach space F are linear maps L : F! R. A linear operator is continuous if and only if it is bounded, i.e., the operator norm 1 kLk := supf2F jLfj=kfk is finite. The set of all continuous/bounded linear operators is called the dual space F ∗ of F. The dual space itself is a Banach space under the operator norm. Definition 1. A random element W in a Banach space (F; k · k) is said to have a Gaussian law if for every L 2 F ∗, the scalar random variable LW is normally distributed. W is called centered if LW has mean zero for every such L. W is said to be non-degenerate if LW = 0 with probability one only when L = 0. When F = Rp, this defines the p-variate Gaussian distribution. In this case, any continuous linear map L is an inner product operation with respect to some vector a 2 Rp, L : f 7! aT f. The law of W is uniquely characterized by µ = E(W ) and Σ = Var(W ). Clearly, Σ is a non-negative definite matrix, i.e., for any u 2 Rp, uT Σu ≥ 0. When F ⊂ RS, any random element W in F is both a random function W : S ! R and a stochastic process W = (W (s): s 2 S). If W is Gaussian in F, we can as well call it a Gaussian process. When F = C(S) for some compact S ⊂ Rd, for every s 2 S, the evaluation function δs : f 7! f(s) is a continuous linear operator (of norm 1). Therefore, W (s) = δsW is normally distributed for any s 2 S. Moreover, for any s1; : : : ; sn 2 S, X = T n (W (s1);:::;W (sn)) satisfies the Gaussianity condition for random elements in R , and hence X is a multivariate normal random variate. We will show that the law of any centered Gaussian element W 2 C(S) is uniquely characterized by the covariance function k : S×S ! R given by k(s; t) = CovfW (s);W (t)g = EfW (s)W (t)g. Notice that a covariance function must be non-negative definite, i.e., it is symmetric (k(s; t) = k(t; s), 8s; t 2 S) and X for any n 2 N; a1; : : : ; an 2 R; and; s1; : : : ; sn 2 S; aiajk(si; sj) ≥ 0; ij We would also show that for any (continuous) non-negative definite function k on S×S, there exists a Gaussian W 2 C(S) with covariance function k. Remark 1. When F = L2(S; ν), the evaluation functions δs are no longer continuous, hence for a Gaussian W 2 F, it's no longer meaningful to talk about normality of (W (s1);:::;W (sn)). However, for an orthonormal basis fφj(s): j = 1; 2;:::g of F, e.g., the Legendre polynomials if S = [0; 1] and ν = Lebesgue measure, the Fourier co- ^ R efficients (Wj = φj(s)W (s)ν(ds): j = 1; 2;:::) 2 `2 satisfy the Gaussianity condition for a random element in `2 (the space of square summable sequences). On `2, coordinate ^ ^ T projection is continuous, and hence, for any fj1; : : : ; jng ⊂ N, X = (Wj1 ;:::; Wjn ) P ^ has a multivariate normal distribution. Note that W = j Wjφj, and, if the latter P ^ series converges pointwise, then it's meaningful to define W (s) = j Wjφj(s). 2 2.2 Gaussianity and Hilbert space With every centered Gaussian element W in a Banach space (F; k·k), one can associate a Hilbert space1 (H; h·; ·i) where H is a continuously embedded2 subset of F. To see this, define the map g : F ∗ !F given by L 7! gL := Ef(LW )W g, with hgL; gL0i = Ef(LW )(L0W )g and take H the completion of g(F ∗) under h·; ·i. It's easy to see 2 2 kfk ≤ σkfkH where σ = EkW k , and hence, H is continuously embedded into F. Because the Hilbert space norm is stronger than the Banach space norm, the completion of g(F ∗) remains a subset of F. When W is non-degenerate, H is a dense subset of F. For this definition to be valid, one must be able to define Ef(LW )W g and establish 0 0 0 0 0 that if h = gL1 = gL1 and h = gL2 = gL2 then Ef(L1W )(L2W )g = Ef(L1W )(L2W )g, i.e., the inner product is well defined. For X a random element of a Banach space (F; k · k), E(X) is defined as the unique element f 2 F such that Lf = E(LX) for every L 2 F ∗. Such a unique element exists as long as EkXk < 1. For our case, X = (LW )W for some L 2 F ∗, and hence kXk = jLW jkW k ≤ kLkkW k2. So we need EkW k2 < 1. This is true for any centered Gaussian element W in a separable Banach space, but we omit a proof3. Also notice that if gL = gL0, then with ∆ = L − L0 2 F ∗, 2 0 0 = ∆(g∆) = E(∆W ) , hence ∆W = 0 with probability one. Therefore, if gL1 = gL1 0 and gL2 = gL2, 0 0 0 Ef(L1W )(L2W )g − Ef(L1W )(L2W )g = Ef(∆1W )(L2W )g + Ef(L1W )(∆2W )g = 0 0 0 where ∆1 = L1 − L1 and ∆2 = L2 − L2. For any h = gL 2 H, define the scalar random variable U(h) = LW . This is well defined because, as shown above, if h = gL = gL0 then (L − L0)W is zero with probability one. Therefore we can associate with W the stochastic process U = (U(h): h 2 H). For any finite set fh1; : : : ; hng ⊂ H, the random vector (U(h1);:::;U(hn)) has a n-dimensional normal distribution with mean zero and variance-covariance matrix determined by: EfU(hi)U(hj)g = hhi; hji. If (F; k · k) is separable, it can be shown that (H; h·; ·i) is also separable4. Such an ^ H admits an at most countable orthonormal basis fhn : n ≥ 1g ⊂ H. The following result is fundamental (see van der Vaart and van Zanten, 2008, Theorem 4.3) ^ ^ Theorem 1. The random variables U(h1);U(h2);::: are IID standard normal variables and X ^ ^ W = U(hn)hn n where the series converges in the Banach space norm k · k, almost surely. 1Known as the Cameron-Martin space of W in F. 2 That is H ⊂ F and for some constant C, khk ≤ CkhkH for every h 2 H. 3Ferniqu´e'sTheorem says that for any centered Gaussian W in a separable Banach space (F; k · k), 2 2 1=2 P1 2n E expfkW k =(18R )g ≤ e + n=0(e=3) where R := inffr : Pr(kW k ≤ r) ≥ 0:9g; see Stroock (2010) Theorem 8.2.1. 4This is not a trivial result and generally is not guaranteed for any Hilbert space that is continuously embedded into a separable Banach space. But it holds for the Cameron-Martin space H we have here. 3 Two important consequences of this result are: (1) any Gaussian random element W in a separable Banach space (F; k·k) is completely determined by the Hilbert space P ^ (H; h·; ·i) associated with it, and, (2) W can be expanded in a series W = n Znhn for any orthonormal basis of (H; h·; ·i), where the series converges in the norm of the Banach space.