1 Stationary Process
Model: Time invariant mean
yt = + "t
1.1 De…nition:
jt = E (yt )(yt j ) = E ("t"t j)
2. Stationarity: If neither the mean nor the autocovariance depend on the data t; then the process
yt is said to be covariance stationary or weakly stationary
E (yt) = for all t
E (yt )(yt j ) = j for all t and any j
3. Ergodicity:
(a) Covariance stationary process is said to be ergodic for the mean if
1 T y p E (y ) T t t t=1 ! X for all j: Alternatively we have 1 < : j 1 j=0 X
(b) Covariance stationary process is said to be ergodic for second moment if
T 1 p (yt )(yt j ) j T j ! t=j+1 X for all j:
4. White Noise: A series "t is a white noise process if
2 2 E ("t) = 0;E "t = ;E ("t"s) = 0 for all t and s: 1.2 Moving Average
The …rst-order MA process: MA(1)
2 yt = + "t + "t 1;"t iid 0; "
1 2 2 2 E (yt ) = = 1 + 0 " 2 E (yt )(yt 1 ) = 1 = "
E (yt )(yt 2 ) = 2 = 0 MA(2)
yt = + "t + 1"t 1 + 2"t 2
2 2 2 0 = 1 + 1 + 2 "
2 1 = (1 + 21) "
2 2 = 2"
3 = 4 = ::: = 0
MA( ) 1
1 yt = + j"t j j=0 X yt is stationary if 1 2 < : square summable j 1 j=0 X 1.3 Autoregressive Process
AR(1)
yt = a + ut; ut = ut 1 + "t
yt = a (1 ) + yt 1 + "t 2 yt = a (1 ) + "t + "t 1 + "t 2 + ::: so that yt = MA ( ) ; and 1 1 1 2j = < 1 2 1 j=0 X 1 2 1 2 0 = 2 "; 1 = 2 "; t = t 1 1 1 AR(p)
yt = a (1 ) + 1yt 1 + ::: + pyt p + "t where p
= j: j=1 X
t = 1 t 1 + ::: + p t p : Yule-Walker equation
2 Augmented Form for AR(2)
yt = a (1 ) + (1 + 2) yt 1 2yt 1 + 2yt 2 + "t
= a (1 ) + yt 1 2yt 1 + "t Unit Root Testing Form
yt = a (1 ) + (1 ) yt 1 2yt 1 + "t
1.4 Source of MA term
Example 1:
yt = yt 1 + ut; xt = xt 1 + et; where
ut; et = white noise
Consider z variable such that
zt = xt + yt:
Now does zt follow AR(1)?
zt = (yt 1 + xt 1) + ( ) xt 1 + ut + et = zt 1 + "t where 1 j "t = ( ) et j 1 + ut + et j=0 X so that zt becomes ARMA (1; ) : 1 Example 2:
ys = ys 1 + us; s = 1; :::; S You observe only even event. Then we have
2 ys = ys 2 + us 1 + us
Let
xt = ys for t = 1; :::; T ; s = 2; :::; S:
Then we have 2 xt = xt 1 + "t;"t = ut 1=2 + ut so that xt follows ARMA(1,1)
3 1.5 Model Selection
1.5.1 Information Criteria
Consider a criteria function given by
ln L (k) (T ) cn (k) = 2 + k T T where (T ) is a deterministic function. The model (lag length) is selected by minimizing the above criteria function with respect to k: That is
arg min cT (k) k There are three famous criteria functions
AIC: (T ) = 2
BIC(Schwartz): (T ) = ln T
Hannan-Quinn: (T ) = 2 ln (ln T )
Let k be the true lag length. Then the likelihood function must be maximized with k asymptotically. That is,
plimT ln L (k) > plimT ln L (k) for any k !1 !1
Now, consider two cases. First, k < k: Then we have
ln L (k) (T ) ln L (k) (T ) lim Pr [cT (k) cT (k)] = lim Pr 2 + k 2 + k T T T T T T !1 !1 ln L (k) ln L (k) 1 (T ) = lim Pr (k k) = 0 T T T 2 T !1 for all (T )s:
Next, consider the case of k > k: Then we know that the likelihood ration test given by
D 2 2 [ln L (k) ln L (k)] k k ! Now consider AIC …rst.
D 2 T (cT (k) cT (k)) = 2 [ln L (k) ln L (k)] 2 (k k) k k 2 (k k) ! Hence we have 2 lim Pr [cT (k) cT (k)] = Pr k k 2 (k k) > 0 T !1 so that AIC may asymptotically over-estimate the lag length. Consider the other two criteria. For both cases,
lim (T ) = : T !1 1
4 Hence we have
D 2 T (cT (k) cT (k)) = 2 [ln L (k) ln L (k)] 2 (k k) (T ) k k 2 (k k) (T ) ! so that 2 lim Pr [cT (k) cT (k)] = lim Pr k k 2 (k k) (T ) = 0 T T !1 !1 Hence BIC and Hannan-Quinn’scriteria consistently estimate the true lag length.
1.5.2 General to Speci…c (GS) Method
In practice, the so-called general to speci…c method is also popularly used. GS method involves the following sequential steps.
Step 1 Run AR(kmax) and test if the last coe¢ cient is signi…cantly di¤erent from zero.
Step 2 If not, let kmax = kmax 1; and repeat step 1 until the last coe¢ cient is signi…cant.
The general-to-speci…c methodology applies conventional statistical tests. So if the signi…cance level for the tests is …xed, then the order estimator inevitably allows for a nonzero probability of overestimation. Furthermore, as is typical in sequential tests, this overestimation probability is bigger than the signi…cance level when there are multiple steps between kmax and p because the probability of false rejection accumulates as k step downs from kmax to p. These problems can be mitigated (and overcome at least asymptotically) by letting the level of the test be dependent on the sample size. More precisely, following Bauer, Pötscher and Hackl (1988), we can set the critical value CT in such a way that (i) CT , and (ii) CT =pT 0 as T : The critical value ! 1 ! ! 1 corresponds to the standard normal critical value for the signi…cance level T = 1 (CT ), where ( ) is the standard normal c.d.f. Conditions (i) and (ii) are equivalent to the requirement that the signi…cance level
log T T 0 and 0 (proved in equation (22) of Pötscher, 1983). ! pT !
References
[1] Bauer, P., Pötscher, B. M., and P. Hackl (1988). Model Selection by Multiple Test Procedures. Statistics, 19, 39–44.
[2] Pötscher, B. M. (1983). Order Estimation in ARMA-models by Lagrangian Multiplier Tests. Annals of Statistics, 11, 872–885.
5 2 Asymptotic Distribution for Stationary Process
2.1 Law of Large Numbers for a covariance stationary process
Let consider …rst the asymptotic properties of the sample mean. 1 y = y ;E (y ) = T T t T X Next, as T ! 1 2 2 1 1 E (yT ) = E 2 (yt ) = 2 T 0 + 2 (T 1) 1 + + 2 T 1 T T 1 nXT 1 o 1 = 0 + 2 1 + + T 1 T T T 1 0 + 2 1 + + T 1 T Hence we have 2 1 lim T E (yT ) = j T !1 X 1
2 Example: yt = ut; ut = ut 1 + "t;"t iid 0; : Then we have 1 2 1 T 1 1 E ut = 0 + 2 1 + + T 1 T T T T X 2 1 T 1 T 2 2 1 T 1 = 1 + 2 + 2 + + T 1 2 T T T 1 2 2 = 1 + T T + T 1 T 1 2 T (1 )2 ( ) 1 2 2 T (1 ) 2 T 1 = 2 1 + 2 + 2 T 1 ( T (1 ) T (1 ) ) 1 2 (1 ) 2 T 1 = 2 1 + 2 2 + 2 T 1 ( (1 ) T (1 ) ) 1 2 = 1 + 2 + O T 2 T 1 2 (1 ) 1 2 1 + = + O T 2 T (1 ) (1 + ) (1 ) 2 1 2 = + O T T (1 )2 where note that T T t 2 2 t = T T + T 1 : T T 2 t=1 (1 ) X Now as T ; we have ! 1 2 1 2 lim T E ut = : T T 2 !1 (1 ) X
6 2.2 CLT for Martingale Di¤erence Sequence
E (yt) = 0 & E (yt t 1) = 0 for all t; j then yt is called m.d.s.
If yt is m.d.s, then yt is not serially correlated.
1 T 2 2 CLT for a mds Let yt 1 be a scalar mds with yt = T yt: Suppose that (a) E y = > 0 f gt=1 t=1 t t 1 T 2 2 r 1 T 2 p 2 with T t > 0 (b) E yt < for some r > 2 andP all t (c) T yt ; Then t=1 ! j j 1 t=1 ! P d 2 P pT yt N 0; : ! CLT for a stationary stochastic process Let
1 yt = + j"t j j=0 X 2 where "t is iid random variable with E " < and 1 < : Then t 1 j=0 j 1 P d 1 pT (yT ) N 0; : ! 0 j1 j= X 1 @ A
Example 1: 2 yt = a + ut; ut = ut 1 + et; et iid 0; e Then we have 1 j yt = a + et j: j=0 X Hence 2 p d e T (yT a) N 0; 2 ! (1 ) ! Example 2: 2 yt = a (1 ) + yt 1 + "t;"t iid 0; (yt 1 yT 1)("t "T ) ^ = + 2 (yt 1 yT 1) P Show the condition that P 1 2 2 lim E (yt 1 yT 1) = Q < T T !1 1 where Q2 = 2= 1 2 : X Calculate 1 2 lim E (yt 1 yT 1)("t "T ) T pT !1 Show that X pT (^ ) d N 0; 1 2 : ! 7 3 Finite Sample Properties
3.1 Calculating Bias By using Simple Taylor Expansion
Unknown Constant Case
yt = a + yt 1 + et; et iidN (0; 1) First, show that
A EA Cov (A; B) V ar (B) 2 E = 1 + + O T B EB E(A)E(B) E B 2 ( ) ! 2 EA E (A a)(B b) EAE (B b) 2 = + + O T EB E(B)2 E (B)3 Let EA = a; EB = b and take the Taylor expansion of A=B around a and b:
A a 1 a 1 1 a 2 = + (A a) (B b) (A a)(B b) (A a)(B b) + (B b) + Rn B b b b2 b2 b2 b3
Take expectation.
A a 1 a 1 a 2 E = + E (A a) E (B b) E (B b)(A a) + E (B b) + ERn B b b b2 b2 b3 a 1 a 2 = Cov(A; B) + V ar(B) + O T b b2 b3 Now consider y~ty~t 1 E^ = E 2 =? y~t 1 P Note that in this example, we have P 2 2 1 E (A) = E (B) = e e + O (1 2) T (1 )2 T 2 and k+m 1 + 22l E (xtxt+kxt+k+lxt+k+l+m) = (1 2)2 if ut is normal. From this, we can calculate all moments. For an example, we have
T 1 1 2 1 3T 1 + 22t E x2 = + 2 (T i) T 2 t T 2 2 2 2 2 "(1 ) t=1 (1 ) # X X Then we have …nally y~ty~t 1 1 + 3 2 E^ = E = + O T y~2 T P t 1 so that P 1 + 3 2 E (^ ) = + O T T 8 For non-constant case
xt = xt 1 + et 2 2 E (^ ) = + O T T For a trend case
yt = a + bt + yt 1 + et 2(1 + 2) 2 E (^ ) = + O T T 3.2 Approximating Statistical Inference by using Edgeworth Expansion
For non-constant case (Phillips, 1977), we have
2 yt 1ut yt 1 (yt yt 1) yt 1yt yt 1 ^ = 2 = 2 = 2 yt 1 yt 1 yt 1 P P P P so that (^ ) can be expressedP as a function ofP moments. Let P
pT (^ ) = pT e (m) where m stands for a vector of moments. Then taking Taylor expansion yields
1 1 1 pT e (m) = pT e m + e m m + e m m m + O r r 2 rs r s 6 rst r s t p T 2 where @e (0) er = ; ... etc. @mr Solving all moments yields
pT (^ ) (w) Pr w = (w) + w2 + 1 1 2 ! pT 1 2 ! p p where w = x= 1 2: p For constant case (Tanaka, 1983)
pT (^ ) (w) + (w)2 Pr w = (w) + 1 2 ! pT 1 2 ! p p
9 4 Covariance-Stationary Vector Processes
Consider the following simple VAR(1) with bivariate variables
y1t = a1 + b11y1t 1 + b12y2t 1 + e1t
y2t = a2 + b21y1t 1 + b22y2t 1 + e2t; alternatively we can rewrite it as
yt = a + byt 1 + et where
a1 b11 b12 e1t a = ; b = ; et = 2 3 2 3 2 3 a2 b21 b22 e2t Usually, 4 5 4 5 4 5
Eetes0 = for t = s = 0 otherwise
4.1 State Space Representation
Consider VAR(p) with two variables given by
yt = a + b1yt 1 + + bpyt p + et
Then we can rewrite it as
b1 b2 bp et yt 2 I2 0 0 3 yt 1 2 0 3 2 . 3 6 7 2 . 3 . = 6 0 I2 0 7 . + . ; 6 7 6 . 7 6 7 6 .. 7 6 7 6 . 7 6 yt p+1 7 6 . 7 6 yt p 7 6 7 6 7 6 7 6 7 6 0 7 4 5 6 7 4 5 6 7 6 0 0 0 7 6 7 6 7 4 5 4 5 or
t = Ft 1 + vt so that any VAR(p) can be rewritten as VAR(1). We call this form ‘statespace representation’.
4.2 Stationarity Condition
The eigenvalues of the matrix F satisfy
p p 1 p 2 I b1 b2 bp = 0
10 where are the eigenvalues. That is, as long as < 1; yt is stationary. j j Vector MA( ) Representation 1 s If the eigenvalues of F all lie inside the unit circle, then F 0 as s and yt can be rewritten as ! ! 1
1 j yt= + F vt j j=0 X or equivalently 1 yt= + jet j: j=0 X If 1 < (absolute summable), then j=0 j 1 P
1. the autocovariance between the ith variable at time t and the jth variable s periods, E (yit ) yjs ; i j exisits and is given by the row i; column j element of
1 s = s+k k0 for s = 0; 1; ::: k=0 X
2. the sequence of matrices k 1 is absolute summable f gk=0
4.3 Autocovariance
Note that
Ey1ty1t 1 = Ey1ty1t+1;
Ey1ty2t 1 = Ey1ty2t+1: 6
Let
E (yt) = ;
E (yt )(yt j )0 = j
E (yt+j )(yt )0 = j and