Stationary Processes and Their Statistical Properties
Total Page:16
File Type:pdf, Size:1020Kb
Stationary Processes and Their Statistical Properties Brian Borchers March 29, 2001 1 Stationary processes A discrete time stochastic process is a sequence of random variables Z1, Z2, :::. In practice we will typically analyze a single realization z1, z2, :::, zn of the stochastic process and attempt to esimate the statistical properties of the stochastic process from the realization. We will also consider the problem of predicting zn+1 from the previous elements of the sequence. We will begin by focusing on the very important class of stationary stochas- tic processes. A stochastic process is strictly stationary if its statistical prop- erties are unaffected by shifting the stochastic process in time. In particular, this means that if we take a subsequence Zk+1, :::, Zk+m, then the joint distribution of the m random variables will be the same no matter what k is. Stationarity requires that the mean of the stochastic process be a constant. E[Zk] = µ. and that the variance is constant 2 V ar[Zk] = σZ : Also, stationarity requires that the covariance of two elements separated by a distance m is constant. That is, Cov(Zk;Zk+m) is constant. This covariance is called the autocovariance at lag m, and we will use the notation γm. Since Cov(Zk;Zk+m) = Cov(Zk+m;Zk), we need only find γm for m 0. The ≥ correlation of Zk and Zk+m is the autocorrelation at lag m. We will use the notation ρm for the autocorrelation. It is easy to show that γk ρk = : γ0 1 2 The autocovariance and autocorrelation ma- trices The covariance matrix for the random variables Z1, :::, Zn is called an auto- covariance matrix. γ0 γ1 γ2 : : : γn 1 2 − 3 γ1 γ0 γ1 : : : γn 2 Γ = − n 6 ::::::::::::::: 7 6 7 4 γn 1 γn 2 : : : γ1 γ0 5 − − Similarly, we can form an autocorrelation matrix 1 ρ1 ρ2 : : : ρn 1 2 − 3 ρ1 1 ρ1 : : : ρn 2 P = − : n 6 ::::::::::::::: 7 6 7 4 ρn 1 ρn 2 : : : ρ1 1 5 − − Note that 2 Γn = σZ Pn: An important property of the autocovariance and autocorrelation matrices is that they are positive semidefinite (PSD). That is, for any vector x, T T x Γnx 0 and x Pnx 0. To prove this, consider the stochastic process Wk, where ≥ ≥ Wk = x1Zk + x2Zk 1 + : : : xnZk n+1 − − The variance of Wk is given by n n V ar(Wk) = xixjγ i j X X j − j i=1 j=1 T V ar(Wk) = x Γnx Since V ar(Wk) 0, the matrix Γn is PSD. In fact, the only way that V ar(Wk) ≥ can be zero is Wk is constant. This rarely happens in practice. Challenge: Find a nonconstant stochastic process Zk for which there is some constant Wk. Notice that by the definition of stationarity, the process Wk is also stationary! An important example of a stationary process that we will work with occurs when the joint distribution of Zk, :::, Zk+n is multivariate normal. In this situation, the autocovariance matrix Γn is precisely the covariance matrix C for the multivariate normal distribution. 2 3 Estimating the mean, autocovariance, and au- tocorrelation Given a realization z1, z2, :::, zn, how can we estimate the mean, variance, autocovariance and autocorrelation? We will estimate the mean by n zi z¯ = Pi=1 n We will estimate the autocovariance at lag k with n k 1 − ck = (zi z¯)(zi+k z¯) n X − − i=1 Note that c0 is an estimate of the variance, but it is not the same unbiased estimate that we used in the last lecture. The problem here is that the zi are correlated, so that the formula from the last lecture no longer provides an unbiased estimator. The formula given here is also biased, but is considered to work better in practice. We will estimate the autocorrelation at lag k with ck rk = c0 The following example demonstrates the computation of autocorrelation and autocovariance estimates. Example 1 Consider the following time series of yields from a batch chem- ical process. These data are taken from p 31 of Box, Jenkins, and Reinsel. Read the table by rows. Figure 1 shows the estimated autocorrelation for this data set. Notice that the autocorrelation tends to alternate between positive and negative values, and that after about k = 6, the autocorrelation seems to die out to a random noise level. 47 64 23 71 38 64 55 41 59 48 71 35 57 40 58 44 80 55 37 74 51 57 50 60 45 57 50 45 25 59 50 71 56 74 50 58 45 54 36 54 48 55 45 57 50 62 44 64 43 52 38 59 55 41 53 49 34 35 54 45 68 38 50 60 39 59 40 57 54 23 Table 1: An example time series. 3 1 0.8 0.6 0.4 k r 0.2 0 −0.2 −0.4 0 5 10 15 20 k Figure 1: Estimated autocorrelation for the example data. Just as with the sample mean, the autocorrelation estimate rk is a random quantity with its own standard deviation. It can be shown that 1 1 2 2 2 V ar(rk) (ρv + ρv+kρv k 4ρkρvρv k + 2ρvρk) n X − − ≈ v= − −∞ The autocorrelation function typically decays rapidly, so that we can identify a lag q beyond which rk is effectively 0. Under these circumstances, the formula simplifies to q 1 V ar(r ) (1 + 2 ρ2); k > q: k n X v ≈ v=1 In practice we don't know ρv, but we can use the estimates rv in the above formula. This provides a statistical test to determine whether or not an auto- correlation rk is statistically different from 0. An approximate 95% confidence interval for rk is rk 1:96 V ar(rk). If this confidence interval includes 0, ± ∗ p then we can't rule out the possibility that rk really is 0 and that there is no correlation at lag k. Example 2 Returning to our earlier data set, consider the variance of our estimate of r6. Using q = 5, we estimate that V ar(r6) = :0225 and that the standard deviation is about 0:14. Since r6 = 0:0471 is considerably smaller − than the standard deviation, we will decide to ignore rk for k 6. ≥ 4 4 The periodogram The major alternative to using autocorrelation/autocovariance to analyze time series is the use of the periodogram. In this section we will show that there is an equivalence between a form of the periodogram and the autocovariance function. For our purposes, the autocorrelation function is more convenient, so we will continue to use it throughout this sequence of lectures. The form of the periodogram that we will use is slightly different from the periodogram that we have previously obtained from the FFT. We will fit a Fourier series of the form q z = a + (a cos(2πjn=N) + b sin(2πjn=N)) n 0 X j j j=1 to the time series. Here N is the number of data points (which we will assume is even), and q = N=2. The coefficients can be obtained by a0 =z ¯ 2 N aj = zk cos(2πjk=N); j = 1; 2; : : : ; q 1 N X − k=1 2 N bj = zk sin(2πjk=N); j = 1; 2; : : : ; q 1 N X − k=1 N 1 j aq = ( 1) zj N X − j=1 bq = 0: The relationship between these coefficients and the fft Zn is 2πij=N aj = 2 real(Zj+1e− ); j = 1; 2; : : : ; q 1 − 2πij=N bj = 2 imag(Zj+1e− ); j = 1; 2; : : : ; q 1 − − aq = Zq+1=N − bq = 0 The periodogram is then defined to be N I(j) = (a2 + b2); j = 1; 2; : : : ; q 1 2 j j − 2 I(q) = Naq We can go a step further, and define the sample spectrum at any frequency f, 0 f < 1=2, as ≤ N I(f) = (a2 + b2 ) 2 f f 5 where 2 N a = z cos(2πfj) f N X j j=1 and 2 N b = z sin(2πfj) f N X j j=1 The connection between the sample spectrum and our estimate of the auto- covariance function is N 1 − I(f) = 2 c + 2 c cos(2πfk)! 0 X k k=1 The sample specturm is the Fourier transform of the autocovariance. A proof can be found in Box, Jenkins, and Reinsel. In practice, the sample spectrum from a short time series is extremely noisy. For this reason, various smoothing techniques have been proposed. These are similar to the techniques that we have previously discussed for computing the PSD. Even with such techniques, it's extremely difficult to make sense of the spectrum. On the other hand, it is much easier to make sense of the auto- correlation function of a short time series. For this reason, we'll focus on the autocorrelation from now on. Example 3 Figure 2 shows the sample spectrum for our example data. It's very difficult to detect any real features in this spectrum. This is similar to what we saw earlier when we took the FFT of a short time series. The problem is that with a short time series you get little frequency resolution, and lots of noise. Longer time series make it possible to obtain both better frequency resolution (by using a longer window) and reduced noise (by averaging over many windows.) 6 140 120 100 80 60 I(f) (db) 40 20 0 −20 0 0.1 0.2 0.3 0.4 0.5 f Figure 2: Sample spectrum for the example data.