Gaussian Processes

VAG002- Provided the covariance matrix is nonsingular, the Gaussian processes random vector X has a Gaussian probability density function given by The class of Gaussian processes is one of the most n/2 1/2 widely used families of stochastic processes for mod- fX x D 2 det eling dependent data observed over time, or space, ð exp 1 x m 01 x m 1 or time and space. The popularity of such processes 2 stems primarily from two essential properties. First, In environmental applications, the subscript t will a Gaussian process is completely determined by its typically denote a point in time, or space, or space mean and covariance functions. This property facili- and time. For simplicity, we shall restrict attention to tates model fitting as only the first- and second-order the case of time series for which t represents time. moments of the process require specification. Second, In such cases, the index set T is usually [0, 1 for solving the prediction problem is relatively straight- time series recorded continuously or f0, 1,...,g for forward. The best predictor of a Gaussian process time series recorded at equally spaced time units. at an unobserved location is a linear function of The mean and covariance functions of a Gaussian the observed values and, in many cases, these func- process are defined by tions can be computed rather quickly using recursive formulas. t D EXt 2 The fundamental characterization, as described below, of a Gaussian process is that all the finite- and dimensional distributions have a multivariate normal s, t D cov Xs,Xt 3 (or Gaussian) distribution. In particular the distribution of each observation must be normally distributed. respectively. While Gaussian processes depend only There are many applications, however, where this on these two quantities, modeling can be diffi- assumption is not appropriate. For example, con- cult without introducing further simplifications on sider observations x1,...,xn,wherext denotes a 1 the form of the mean and covariance functions. or 0, depending on whether or not the air pollution The assumption of stationarity frequently provides on the tth day at a certain site exceeds a govern- the proper level of simplification without sacrificing ment standard. A model for there data should only much generalization. Moreover, after applying ele- allow the values of 0 and 1 for each daily obser- mentary transformations to the data, the assumption vation thereby precluding the normality assumption of stationarity of the transformed data is often quite imposed by a Gaussian model. Nevertheless, Gaus- plausible. sian processes can still be used as building blocks A Gaussian time series fXtg is said to be station- to construct more complex models that are appro- ary if priate for non-Gaussian data. See [3–5] for more on modeling non-Gaussian data. 1. m t D EXt D is independent of t,and 2. t C h, t D cov XtCh,Xt is independent of t for all h. Basic Properties For stationary processes, it is conventional to express A real-valued stochastic process fXt,t2 Tg,where the covariance function as a function on T instead T is an index set, is a Gaussian process if all the of on T ð T. That is, we define h D cov XtCh,Xt finite-dimensional distributions have a multivariate and call it the autocovariance function of the process. normal distribution. That is, for any choice of dis- For stationary Gaussian processes fXtg,wehave tinct values t ,...,t 2 T, the random vector X D 1 k 3. X ¾ N , 0 for all t,and X ,...,X 0 has a multivariate normal distribu- t t1 tk 4. X ,X 0 has a bivariate normal distribution tion with mean vector m D EX and covariance matrix tCh t with covariance matrix D cov X, X , which will be denoted by 0 h X ¾ N m, h 0 VAG002- 2 Gaussian processes for all t and h. where fZtg is a sequence of independent and identi- A general stochastic process fXtg satisfying con- cally distributed (iid) normal random variables with 2 ditions 1 and 2 is said to be weakly or second-order mean 0 and variance , f jg is a sequence of square stationary. The first- and second-order moments of summable coefficients with 0 D 1, and fVtg is a weakly stationary processes are invariant with respect deterministic process that is independent of fZtg.The to time translations. A stochastic process fXtg is Zt are referred to as innovations and are defined by strictly stationary if the distribution of Xt1 ,...,Xtn Zt D Xt E XtjXt1,Xt2,...). A process fVtg is is the same as Xt1Cs ,...,XtnCs for any s.Inother deterministic if Vt is completely determined by its words, the distributional properties of the time series past history fVs,s<tg. An example of such a pro- are the same under any time translation. For Gaus- cesses is the random sinusoid, Vt D A cos t C , sian time series, the concepts of weak and strict where A and are independent random variables stationarity coalesce. This result follows immediately with A ½ 0and distributed uniformly on [0, 2 ). In from the fact that for weakly stationary processes, this case, V2 is completely determined by the values X ,...,X and X ,...,X have the same t1 tn t1Cs tnCs of V and V . In most time series modeling applica- mean vector and covariance matrix. Since each of the 0 1 tions, the deterministic component of a time series is two vectors has a multivariate normal distribution, either not present or easily removed. they must be identically distributed. Purely nondeterministic Gaussian processes do not possess a deterministic component and can be repre- Properties of the Autocovariance Function sented as a Gaussian linear processes, An autocovariance function Ð has the properties: 1 Xt D jZtj 6 1. 0 ½ 0, jD0 2. j h jÄ 0 for all h, 3. h D h ,i.e. Ð is an even function. The autocovariance of fXtg has the form Autocovariances have another fundamental prop- 1 erty, namely that of non-negative definiteness, h D j jCh 7 n jD0 ai ti tj aj ½ 0 4 i,jD1 The class of autoregressive (AR) processes, and its extensions, autoregressive moving-average (ARMA) for all positive integers n, real numbers a1,...,an, and t1,...,tn 2 T. Note that the expression on the processes, are dense in the class of Gaussian linear processes. A Gaussian AR(p) process satisfies the left of (4) is merely the variance of a1Xt1 CÐÐÐC recursions anXtn and hence must be non-negative. Conversely, if a function Ð is non-negative definite and even, then it must be an autocovariance function of some Xt D !1Xt1 CÐÐÐC!pXtp C Zt 8 stationary Gaussian process. 2 where fZtg is an iid sequence of N 0, ran- Gaussian Linear Processes dom variables, and the polynomial ! z D 1 !1z p ÐÐÐ!pz has no zeros inside or on the unit cir- If fXt,tD 0, š1, š2,...,g is a stationary Gaussian cle. The AR(p) process has a linear representation process with mean 0, then the Wold decomposition (6) where the coefficients are found as functions of allows Xt to be expressed as a sum of two indepen- the !j (see [2]). Now for any Gaussian linear process, dent components, there exists an AR(p) process such that the difference 1 in the two autocovariance functions can be made arbi- Xt D jZtj C Vt 5 trarily small for all lags. In fact, the autocovariances jD0 can be matched up perfectly for the first p lags. VAG002- Gaussian processes 3 Prediction subset consisting of linear independent variables. The covariance matrix of this prediction subset will be Recall that if two random vectors X1 and X2 have a nonsingular. A mild and easily verifiable condition joint normal distribution, i.e. for ensuring nonsingularity of n for all n is that h ! 0ash !1with 0 >0 (see [1]). X m 1 ¾ N 1 , 11 12 While (12) and (13) completely solve the pre- X m 2 2 21 22 diction problem, these equations require the inver- n ð n and 22 is nonsingular, then the conditional distri- sion of an covariance matrix which may n bution of X1 given X2 has a multivariate normal be difficult and time consuming for large .The distribution with mean Durbin–Levinson algorithm (see [1]) allows one to 0 compute the coefficient vector !n D !n1,...,!nn 1 mX1jX2 D m1 C 1222 X2 m2 9 and the one-step prediction errors vn recursively from !n1, vn1, and the autocovariance function. and covariance matrix 1 The Durbin–Levinson Algorithm X1jX2 D 11 1222 21 10 The key observation here is that the best mean The coefficients !n in the calculation of the one-step square error predictor of X in terms of X prediction error (11) and the mean square error of 1 2 prediction (13) can be computed recursively from the (i.e. the multivariate function g X2 that minimizes 2 equations EjjX1 g X2 jj ,wherejj Ð jj is Euclidean distance) is E X jX D m j which is a linear function of 1 1 2 X1 X2 n X2. Also, the covariance matrix of prediction error, 1 !nn D n !n1,j n 1 vn1 X jX , does not depend on the value of X2.These 1 2 jD1 results extend directly to the prediction problem for Gaussian processes. !n,1 !n1,1 !n1,n1 Suppose fX ,tD 1, 2,...g is a stationary Gaussian . . . t .

Gaussian Processes

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support