1 Basic Concepts in - See pp1-17

2 Stationary Processes and Time Series I

2 Suppose that Xt is a ts such that E(Xt)= < and E(X )= C < , where { } ∞ t ∞ and C are constants independent of time t. To achieve this in practice, it is necessary to apply suitable preliminary transformations as described before. Suppose that now we have a time series with following properties:

E(Xt) = Constant ind. of t and

V ar(Xt) = Constant ind. of t.

It is known that the consecutive values of Xt are not independent and hence the { } function alone is not sufficient to specify the second order properties of a ts. Since the consecutive observations are correlated, we define a measure called the autocovariance function in order to study the serial dependency of time series. The standardized version of this autocovariance function is called the function.

Note: Autocovariance and autocorrelation functions explain the serial dependence structure of neighbouring observations at different lags. The values can be compared with the ordinary and correlation functions appear in bivariate data anal- ysis.

An Illustration

Note: As the lag increases, it is clear from the above that autocoariance and auto- correlation functions decrease.

18 2.1 Autocovariance and Autocorrelation Functions

2 Supose that Xt is a ts with = E(Xt) and σ = V ar(X). The covariance { } between Xt and Xt k, Cov(Xt,Xt k) or the autocovariance function of Xt at lag + + { } (separation) k and is denoted by γ(k) (or γk). This is defines as:

γ(k) = Cov(Xt ,Xt+k)

= E[(Xt )(Xt k )] − + − 2 = E(Xt Xt k) , since E(Xt)= for all t . + −

2 2 Clearly, γ(0) = V ar(Xt)= E(X ) . t −

Let ρ(k) = Corr(Xt ,Xt+k) be the autocorrelation function (acf) at lag k. Since

V ar(Xt)= V ar(Xt=k), we have Cov(X ,X ) ρ(k) = t t+k V ar(Xt) V ar(Xt k) q · + γ(k) = . γ(0) Notes:

(1) Some authors replace γ(k) by γk or γX (k) and ρ(k) by ρk or ρX (k).

(2) ρ(k) measures the dependence of observations with k time points apart of the

series. That is, the correlation between the sets Xt,Xt 1, , and Xt+k,Xt+k 1, . − ··· − ······ This is obviously a measure of the similarity between Xt and the same real- ization shifted to the right (or left) by k units.

3) It can be seen that ρ(k) 1 for all k. | | ≤

4) Since Xt is time invariant statistical properties, Cov(Xt,Xt k)= Cov(Xt,Xt+k). { } − Cosequently, γ(k)= γ( k). − Estimation of γ(k) and ρ(k)

Suppose that we have a time series of size n, X ,X , ,Xn. Let Cn,k and Rn,k be the 1 2 ··· sample autocovariance and autocorrelation estimators at lag k of γ(k) and ρ(k).

(1) An estimator of γ(k) based on n observations is given by

n k 1 − ¯ ¯ Cn,k = X(Xt X)(Xt+k X); k =0, , n 1. n t=1 − − ··· −

19 (See Chatfield p.20).

Note: Although the denominator of the rhs should be n k, many authors − use n in order to simplify the distributional properties of Cn,k. (for large n, the difference between n and n k is neglegible. − (2) An estimator of ρ(k) based on n observations is given by

Cn,k Rn,k = Cn,0

Note: For simplicity some authors write Ck for Cn,k and Rk for Rn,k.

Sample estimates: Suppose that we have n observations x , x , , xn. Then the 1 2 ··· corresponding sample autocovariance and autocorrelation functions (or estimates) at lag k. are:

n k t=1− (xt x¯)(xt+k x¯) ck = P − − and • n n k − (xt x¯)(xt k x¯) Pt=1 + rk = n − 2 − . • (xt x¯) Pt=1 − For example,

n 1 n 1 − (xt x¯)(xt x¯) − (xt x¯)(xt x¯) Pt=1 +1 Pt=1 +1 c1 = − − and r1 = n − 2 − n (xt x¯) Pt=1 − are used to estimate C1 and R1.

Note: It is clear that r1 is approximately the ordinary sample correlation coefficient between the pairs of consecutive values (x1, x2), (x2, x3), (x3, x4), , (xn 1 , xn). This ··· − measures the internal association of the series x1, , xn 1 and x2, , xn (at step 1 ··· − ··· or lag 1) between successive readings.

Similarly, r2 is approximately the ordinary sample correlation coefficient between the pairs of values (x1, x3), (x2, x4), (x3, x5), , (xn 2 , xn) and so measures the internal ··· − association of the series x , , xn at step 2 (or lag 2) between readings. 1 ···

Example. Compute r1, r2, and r3 for the time series give below:

t 12345678910

xt 145744812 4

Solution:

10 1 − (xt x¯)(xt x¯) c Pt=1 +1 1 r1 = 10 − 2 − = (xt x¯) c Pt=1 − 0 20 2 x¯ = 4 and (xt x¯) = 48. X − (1 4)(4 4)+(4 4)(5 4) + + (2 4)(1 4) Therefore r = − − − − ······ − − = 0.0625 1 48 −

(x1 x¯)(x3 x¯)+(x2 x¯)(x4 x¯)+ +(x8 x¯)(x10 x¯) r2 = − − − 10 − 2······ − − (xt x¯) Pt=1 −

(1 4)(5 4)+(4 4)(7 4) + + (1 4)(4 4) = − − − − ······ − − = 0.23 48 −

Similarly, r =0.0625, r = 0.1042, r = 0.1875, etc. 3 4 − 5 −

Note: Many compute packages can be used to compute c1 and r1. For example, the R command is: r=acf(data). The numerical values of the acf can be viewed by just typing r.

Note: For any time series (stationary or not), x , x , , xn , the autocorrelation 1 2 ··· function (acf) is a very useful diagnostic tool. if the

Now we study the properties of X¯n,Cn,k and Rn,k and study their distri- butions.

2.2 Sampling properties of X¯n,Cn,k and Rn,k

Sampling Properties of X¯n

¯ X1+X2+ +Xn Let Xn = n ··· be an estimator of . ¯ ¯ 1 n 1 k Exercise: Show that E(Xn)= and V ar(Xn)= n k=− (n 1)(1 |n| )γ(k). P − − − Now suppose that γ(k) < . P | | ∞ n 1 k Then clearly, k=− n+1(1 |n|)γ(k) ∞ γ(k) as n and therefore we have P − − → P−∞ →∞

1 ∞ V ar(X¯n) γ(k) ≈ n X −∞ for large n.

When Xt is a Gaussian time series, we have the following approximate normal { } distribution for X¯n for large n

1 ∞ X¯n N(, γ(k) . ≈ n X | | ∞

21 Sampling Properties of Cn,k

n k 1 − ¯ ¯ Cn,k = X(Xt X)(Xt+k X) n t=1 − −

E(Cn,k) =

=

= ...... as n →∞

i.e. Ck or Cn,k is asymptotically unbiased for γ(k).

Sampling Properties of Rn,k

It can be seen that for large n, under the null hypothesis H : ρ(k)=0, Rn,k 0 ∼ ( 1 , 1 ). Thus under the null hypothesis of H : ρ(k)=0, approximately 95% of N − n n 0 the sample should fall between the bounds 1 1.96 . These bounds − n ± √n are further approximated by 2 for large n. ± √n 2 i.e. Observed values of rk which fall outside are ‘significantly’ different from ± √n zero at 5% level.

------

123456789

------

Note: If k is large relative to the sample size n, then the estimates ck and rk are very unreliable. The calculations are reasonable for n 50 and a general rule of thumb is ≥ to restrict calculations of ck and rk for n 50 and k n/4. ≥ ≤

The Distribution of Rk under the Null Hypothesis of H0 : ρk =0

For a purely random process or under the null hypothesis of H0 : ρk = 0, it can be shown that each rk is approximately independently and identically distributed normal random variables with mean 0 and variance 1/n for large N. Hence a simple test to see whether a given time series is a realization of purely random process (independently and identically distributed random variables) look at the proportion

22 of individual sample rk’s satisfy

rk > 2/√n, k =1, 2, .... | |

If the proportion is 1/20 or less, the sequence is likely a random process.

A portmantau test

Suppose that all acf values are zero up to a maximum specified lag ℓ. Then it is clear that the ℓ 2 Q = n X rk k=1 has a χ2 distribution with ℓ degrees of freedom. This test statistic can be used to test if all acf values are collectively zero up to a maximum lag ℓ.

There is another tool to recognise the serial dependence of time series data. This is known as the partial autocorrelation function (pacf).

The Sample

The plot of rk against k is called the sample correlogram of the data. This correlogram can be used to detect the long run behaviour of the autocorrelation structure of the series.

Detection of Randomness, short-term and long-term correlations of a TS First obtain the correlogram of the data and mark 2 boundaries. ± √n (1) Purely Random Series or No Memory Series

In this case ρ(k) = 0 for all k , i.e. the sample acf rk should be not { } significant at all lags

diagram

Note: However, a plot of the first 20 values of rk , we expect to find one ‘slightly sigificant’ value on average even when the ts really is random (1 in 20 is 5%). If the siginicant values are at low lags (eg. one, two), then a care must be taken before proceeds. See Chatfield p.21 for details.

23 (2) Short-term or Short Memory Series: In this situation rk’s are significant for

small k (i.e. Lags 1, 2, 3 etc.). rk 0 for moderate k. i.e. a.c.f. should ≈ die-out quickly.

diagram

Example: If r is significant and rk 0; k > 1 tells us that there is a strong 1 ≈ lag 1 dependence.

(3) Long-term with a Linear Trend: rk’s are significant for many values of k . rk { } has many significant values. i.e. a.c.f. do not die out quickly.

diagram

(4) Seasonal Fluctuation: If a ts contains a seasonal component, then the correlo- gram will also exhibit an oscillation at the same frequency.

diagram

24 (5) Long Memory Time Series This type of ts exhibits a hyperbolic decay of the correlogram with a long tail behaviour .

diagram

The above arguments suggest that the sample correlogram can be used as a useful diagnostic tool to detect serial correlation structure of a time series.

2.3 The Autocorrelation Plot as Diagnostic Tool

We summarise the above information below:

(1) The acf, like most techniques of for time series, requires the series to be ‘long’ (i.e. n large, say n 40). ≥

(2) If in the series x , , xn a linear trend is present or the series is non-stationary, 1 ··· the plot of rk against k decreases very slowly with increasing k .

(3) If a periodic component of period d is present in the series x , , xn (and 1 ··· hence shows up as a periodic effect of period d in a plot of xt vs. t ), it also

shows up as a periodic effect of period d in the plot of rk vs. k .

25 (4) If the sequence x , , xN is stationary, the plot of rk against k decreases rapidly 1 ··· as k increases.

Note: The (sample) autocorrelation at lagk, rk , is a measure of dependence of readings time-distance k apart in a stationary time series, and is defined for k 1. ≥ It is an estimate of the population quantity ρ(k) or ρk = Corr(xt , xt+k).

Example: Comment on a time series of N = 100 with the following acf. k :123456 7 8 9

rk : 0.7 -0.2 0.1 0.6 0.1 0.02 0.31 0.55 0.31 Solution: The correlogram of the above series with boundaries 2 = 0.2 is: ± √100 ±

26 Comments: The rk’s being correlation coefficients, it is clear that there is not a

sharp drop-off of rk with k. That is, there are large values of rk at lags k =1, 4, 7, 8, 9 for the data set indicating non-stationarity. Alternatively, there could be a periodicity of the data set with period 3 or 4. However, this is very hard to justify as we have only 10 acf values.

The Partial Autocorrelation Function (PACF)

Given a stationary time series Xt , the partial autocorrelation of lag k is defined as { } the additional autocorrelation between Xt and Xt k when the linear dependence { } { + } of Xt+1 through to Xt+k 1 removed. Equivalently, it is the correlation between { } { − } Xt and Xt+k when the (linear) effect, on each of xt and xt+k , of the intervening readings (Xt+1,Xt+2, ,Xt+k 1) has been removed. This quantity is denoted by πk ··· − (or π(k) by some authors). Analogously, a sample estimate rk (orρ ˆk) of ρk, the corresponding sample estimate of πk is denoted byπ ˆk .

It is easy to see that, for a time series Xt with time invariant first and second order { } moments and autocovariances

ρ ρ2 π = ρ and π = 2 − 1 . 1 1 2 1 ρ2 − 1 Therefore, the first two pacf estimates are

πˆ = r , πˆ =(r r2)/(1 r2). 1 1 2 2 − 1 − 1

The values and plot ofπ ˆk, k 1 (the sample partial autocorrelation function: ≥ PACF) against k serve as a diagnostic tool complementing the role of the ACF. They are produced by many statistical packages with a time-series capability, such as SPlus, SAS, MINITAB and R.

The R command: acf (data, type = “partial”)

Now we give the formal definition of a stationary time series based on the moments

of Xt . { }

27