<<

1 Stationary Process

Model: Time invariant

yt =  + "t

1.1 De…nition:

1.

jt = E (yt )(yt j ) = E ("t"t j)

2. Stationarity: If neither the mean  nor the autocovariance depend on the t; then the process

yt is said to be stationary or weakly stationary

E (yt) =  for all t

E (yt )(yt j ) = j for all t and any j

3. :

(a) Covariance stationary process is said to be ergodic for the mean if

1 T y p E (y ) T t t t=1 ! X for all j: Alternatively we have 1 < : j 1 j=0 X

(b) Covariance stationary process is said to be ergodic for second if

T 1 p (yt )(yt j ) j T j ! t=j+1 X for all j:

4. : A series "t is a white noise process if

2 2 E ("t) = 0;E "t =  ;E ("t"s) = 0 for all t and s:  1.2

The …rst-order MA process: MA(1)

2 yt =  + "t + "t 1;"t iid 0; "  

1 2 2 2 E (yt ) = = 1 +   0 " 2 E (yt )(yt 1 ) = 1 =  " 

E (yt )(yt 2 ) = 2 = 0 MA(2)

yt =  + "t + 1"t 1 + 2"t 2

2 2 2 0 = 1 + 1 + 2 "

2 1 = (1 + 21) "

2 2 = 2"

3 = 4 = ::: = 0

MA( ) 1

1 yt =  + j"t j j=0 X yt is stationary if 1 2 < : square summable j 1 j=0 X 1.3 Autoregressive Process

AR(1)

yt = a + ut; ut = ut 1 + "t

yt = a (1 ) + yt 1 + "t 2 yt = a (1 ) + "t + "t 1 +  "t 2 + ::: so that yt = MA ( ) ; and 1 1 1 2j = < 1 2 1 j=0 X 1 2 1 2 0 = 2 "; 1 = 2 "; t =  t 1 1  1  AR(p)

yt = a (1 ) + 1yt 1 + ::: + pyt p + "t where p

 = j: j=1 X

t = 1 t 1 + ::: + p t p : Yule-Walker equation

2 Augmented Form for AR(2)

yt = a (1 ) + (1 + 2) yt 1 2yt 1 + 2yt 2 + "t

= a (1 ) + yt 1 2yt 1 + "t Testing Form

yt = a (1 ) + (1 ) yt 1 2yt 1 + "t

1.4 Source of MA term

Example 1:

yt = yt 1 + ut; xt = xt 1 + et; where

ut; et = white noise

Consider z variable such that

zt = xt + yt:

Now does zt follow AR(1)?

zt =  (yt 1 + xt 1) + ( ) xt 1 + ut + et = zt 1 + "t where 1 j "t = ( )  et j 1 + ut + et j=0 X so that zt becomes ARMA (1; ) : 1 Example 2:

ys = ys 1 + us; s = 1; :::; S You observe only even event. Then we have

2 ys =  ys 2 + us 1 + us

Let

xt = ys for t = 1; :::; T ; s = 2; :::; S:

Then we have 2 xt =  xt 1 + "t;"t = ut 1=2 + ut so that xt follows ARMA(1,1)

3 1.5

1.5.1 Information Criteria

Consider a criteria function given by

ln L (k)  (T ) cn (k) = 2 + k T T where  (T ) is a deterministic function. The model (lag length) is selected by minimizing the above criteria function with respect to k: That is

arg min cT (k) k There are three famous criteria functions

AIC:  (T ) = 2

BIC(Schwartz):  (T ) = ln T

Hannan-Quinn:  (T ) = 2 ln (ln T )

Let k be the true lag length. Then the must be maximized with k asymptotically. That is,

plimT ln L (k) > plimT ln L (k) for any k !1 !1

Now, consider two cases. First, k < k: Then we have

ln L (k)  (T ) ln L (k)  (T ) lim Pr [cT (k) cT (k)] = lim Pr 2 + k 2 + k T  T T T  T T !1 !1   ln L (k) ln L (k) 1  (T ) = lim Pr (k k) = 0 T T T  2 T !1   for all  (T )s:

Next, consider the case of k > k: Then we know that the likelihood ration test given by

D 2 2 [ln L (k) ln L (k)] k k !  Now consider AIC …rst.

D 2 T (cT (k) cT (k)) = 2 [ln L (k) ln L (k)] 2 (k k) k k 2 (k k) !  Hence we have 2 lim Pr [cT (k) cT (k)] = Pr k k 2 (k k) > 0 T  !1   so that AIC may asymptotically over-estimate the lag length.  Consider the other two criteria. For both cases,

lim  (T ) = : T !1 1

4 Hence we have

D 2 T (cT (k) cT (k)) = 2 [ln L (k) ln L (k)] 2 (k k)  (T ) k k 2 (k k)  (T ) !  so that 2 lim Pr [cT (k) cT (k)] = lim Pr k k 2 (k k)  (T ) = 0 T T  !1  !1  Hence BIC and Hannan-Quinn’scriteria consistently estimate the true lag length.

1.5.2 General to Speci…c (GS) Method

In practice, the so-called general to speci…c method is also popularly used. GS method involves the following sequential steps.

Step 1 Run AR(kmax) and test if the last coe¢ cient is signi…cantly di¤erent from zero.

Step 2 If not, let kmax = kmax 1; and repeat step 1 until the last coe¢ cient is signi…cant.

The general-to-speci…c methodology applies conventional statistical tests. So if the signi…cance level for the tests is …xed, then the order estimator inevitably allows for a nonzero probability of overestimation. Furthermore, as is typical in sequential tests, this overestimation probability is bigger than the signi…cance level when there are multiple steps between kmax and p because the probability of false rejection accumulates as k step downs from kmax to p. These problems can be mitigated (and overcome at least asymptotically) by letting the level of the test be dependent on the sample size. More precisely, following Bauer, Pötscher and Hackl (1988), we can set the critical value CT in such a way that (i) CT , and (ii) CT =pT 0 as T : The critical value ! 1 ! ! 1 corresponds to the standard normal critical value for the signi…cance level T = 1 (CT ), where ( ) is the  standard normal c.d.f. Conditions (i) and (ii) are equivalent to the requirement that the signi…cance level

log T T 0 and 0 (proved in equation (22) of Pötscher, 1983). ! pT !

References

[1] Bauer, P., Pötscher, B. M., and P. Hackl (1988). Model Selection by Multiple Test Procedures. , 19, 39–44.

[2] Pötscher, B. M. (1983). Order Estimation in ARMA-models by Lagrangian Multiplier Tests. Annals of Statistics, 11, 872–885.

5 2 Asymptotic Distribution for Stationary Process

2.1 for a covariance stationary process

Let consider …rst the asymptotic properties of the sample mean. 1 y = y ;E (y ) =  T T t T X Next, as T ! 1 2 2 1 1 E (yT ) = E 2 (yt ) = 2 T 0 + 2 (T 1) 1 + + 2 T 1 T T      1 nXT 1 o 1   = 0 + 2 1 + + T 1 T T    T   1 0 + 2 1 + + T 1  T      Hence we have 2 1 lim T E (yT ) = j T !1  X1

2 Example: yt = ut; ut = ut 1 + "t;"t iid 0;  : Then we have  1 2 1 T 1 1 E ut = 0 + 2 1 + + T 1 T T T    T     X 2 1  T 1 T 2 2 1 T 1 = 1 + 2  + 2  + +  T 1 2 T T    T   1 2 2  = 1 + T T  + T 1 T 1 2 T (1 )2 ( )  1 2 2 T  (1 ) 2  T 1 = 2 1 + 2 + 2 T 1  ( T (1 ) T (1 ) )  1 2  (1 ) 2  T 1 = 2 1 + 2 2 + 2 T 1  ( (1 ) T (1 ) )  1 2  = 1 + 2 + O T 2 T 1 2 (1 )   1 2 1 +   = + O T 2 T (1 ) (1 + ) (1 )   2  1  2 = + O T T (1 )2  where note that T T t 2  2 t = T T  + T 1 : T T 2 t=1 (1 ) X  Now as T ; we have ! 1 2 1 2 lim T E ut = : T T 2 !1  (1 )  X 

6 2.2 CLT for Martingale Di¤erence Sequence

E (yt) = 0 & E (yt t 1) = 0 for all t; j then yt is called m.d.s.

If yt is m.d.s, then yt is not serially correlated.

1 T 2 2 CLT for a mds Let yt 1 be a scalar mds with yt = T yt: Suppose that (a) E y =  > 0 f gt=1 t=1 t t 1 T 2 2 r 1 T 2 p 2 with T t  > 0 (b) E yt < for some r > 2 andP all t (c) T yt  ; Then t=1 ! j j 1 t=1 ! P d 2 P pT yt N 0;  : !  CLT for a stationary Let

1 yt =  + j"t j j=0 X 2 where "t is iid with E " < and 1 < : Then t 1 j=0 j 1  P d 1 pT (yT ) N 0; : ! 0 j1 j= X1 @ A

Example 1: 2 yt = a + ut; ut = ut 1 + et; et iid 0; e  Then we have  1 j yt = a +  et j: j=0 X Hence 2 p d e T (yT a) N 0; 2 ! (1 ) ! Example 2: 2 yt = a (1 ) + yt 1 + "t;"t iid 0;   (yt 1 yT 1)("t "T)  ^ =  + 2 (yt 1 yT 1) P Show the condition that P 1 2 2 lim E (yt 1 yT 1) = Q < T T !1 1 where Q2 = 2= 1 2 : X Calculate  1 2 lim E (yt 1 yT 1)("t "T ) T pT !1   Show that X pT (^ ) d N 0; 1 2 : !  7 3 Finite Sample Properties

3.1 Calculating Bias By using Simple Taylor Expansion

Unknown Constant Case

yt = a + yt 1 + et; et iidN (0; 1)  First, show that

A EA Cov (A; B) V ar (B) 2 E = 1 + + O T B EB E(A)E(B) E B 2   ( ) ! 2  EA E (A a)(B b) EAE (B b) 2 = + + O T EB E(B)2 E (B)3  Let EA = a; EB = b and take the Taylor expansion of A=B around a and b:

A a 1 a 1 1 a 2 = + (A a) (B b) (A a)(B b) (A a)(B b) + (B b) + Rn B b b b2 b2 b2 b3

Take expectation.

A a 1 a 1 a 2 E = + E (A a) E (B b) E (B b)(A a) + E (B b) + ERn B b b b2 b2 b3 a 1 a 2 = Cov(A; B) + V ar(B) + O T b b2 b3  Now consider y~ty~t 1 E^ = E 2 =? y~t 1 P Note that in this example, we have P 2 2 1 E (A) = E (B) = e e + O (1 2) T (1 )2 T 2   and k+m 1 + 22l E (xtxt+kxt+k+lxt+k+l+m) = (1 2)2  if ut is normal. From this, we can calculate all moments. For an example, we have

T 1 1 2 1 3T 1 + 22t E x2 = + 2 (T i) T 2 t T 2 2 2 2 2 "(1  ) t=1 (1  ) # X  X Then we have …nally y~ty~t 1 1 + 3 2 E^ = E =  + O T y~2 T P t 1  so that P 1 + 3 2 E (^ ) = + O T T  8 For non-constant case

xt = xt 1 + et 2 2 E (^ ) = + O T T  For a trend case

yt = a + bt + yt 1 + et 2(1 + 2) 2 E (^ ) = + O T T  3.2 Approximating by using Edgeworth Expansion

For non-constant case (Phillips, 1977), we have

2 yt 1ut yt 1 (yt yt 1) yt 1yt  yt 1 ^  = 2 = 2 = 2 yt 1 yt 1 yt 1 P P P P so that (^ ) can be expressedP as a function ofP moments. Let P

pT (^ ) = pT e (m) where m stands for a vector of moments. Then taking Taylor expansion yields

1 1 1 pT e (m) = pT e m + e m m + e m m m + O r r 2 rs r s 6 rst r s t p T 2    where @e (0) er = ; ... etc. @mr Solving all moments yields

pT (^ )  (w)  Pr w =  (w) + w2 + 1 1 2  ! pT 1 2 !  p p where w = x= 1 2: p For constant case (Tanaka, 1983)

pT (^ )  (w)  +  (w)2 Pr w =  (w) + 1 2  ! pT 1 2 ! p p

9 4 Covariance-Stationary Vector Processes

Consider the following simple VAR(1) with bivariate variables

y1t = a1 + b11y1t 1 + b12y2t 1 + e1t

y2t = a2 + b21y1t 1 + b22y2t 1 + e2t; alternatively we can rewrite it as

yt = a + byt 1 + et where

a1 b11 b12 e1t a = ; b = ; et = 2 3 2 3 2 3 a2 b21 b22 e2t Usually, 4 5 4 5 4 5

Eetes0 = for t = s = 0 otherwise

4.1 State Space Representation

Consider VAR(p) with two variables given by

yt = a + b1yt 1 + + bpyt p + et   

Then we can rewrite it as

b1 b2 bp    et yt  2 I2 0 0 3 yt 1     2 0 3 2 . 3 6 7 2 . 3 . = 6 0 I2 0 7 . + . ; 6    7 6 . 7 6 7 6 .. 7 6 7 6 . 7 6 yt p+1  7 6 . 7 6 yt p  7 6 7 6 7 6 7 6 7 6 0 7 4 5 6 7 4 5 6 7 6 0 0 0 7 6 7 6    7 4 5 4 5 or

t = Ft 1 + vt so that any VAR(p) can be rewritten as VAR(1). We call this form ‘statespace representation’.

4.2 Stationarity Condition

The eigenvalues of the matrix F satisfy

p p 1 p 2 I b1 b2 bp = 0   

10 where  are the eigenvalues. That is, as long as  < 1; yt is stationary. j j Vector MA( ) Representation 1 s If the eigenvalues of F all lie inside the unit circle, then F 0 as s and yt can be rewritten as ! ! 1

1 j yt=  + F vt j j=0 X or equivalently 1 yt=  + jet j: j=0 X If 1 < (absolute summable), then j=0 j 1 P

1. the autocovariance between the ith variable at time t and the jth variable s periods, E (yit  ) yjs  ; i j exisits and is given by the row i; column j element of 

1 s = s+k k0 for s = 0; 1; ::: k=0 X

2. the sequence of matrices k 1 is absolute summable f gk=0

4.3 Autocovariance

Note that

Ey1ty1t 1 = Ey1ty1t+1;

Ey1ty2t 1 = Ey1ty2t+1: 6

Let

E (yt) = ;

E (yt )(yt j )0 = j

E (yt+j )(yt )0 = j and

j = j: 6 Similar to univariate case, we have

yT ; ! and 1 1 lim T E (yT )(yT )0 = j = 0 + 2 j: T  6 !1 j= j=1   X1 X

11 4.4 Model Selection

Suppose that VAR(1) with two variables is a correct speci…cation. Then we have

y1t = a1 + b11y1t 1 + b12y2t 1 + e1t

= a1 + b11y1t 1 + b12 (a2 + b21y1t 2 + b22y2t 2 + e2t 1) + e1t

= a1 + a2b12 + b11y1t 1 + b12b21y1t 2 + e1t + b12e2t 1 + b22y2t 2

= a1 + a2b12 + b11y1t 1 + b12b21y1t 2 + e1t + e2t 1 + b22 (a2 + b21y1t 3 + b22y2t 3 + e2t 2) . . 1 1 = a + bjy1t j + cje2t j + e1t j=1 j=1 X X so that we have AR( ). Similarly, if VAR(1) with three variables is a true model, then any two variables 1 have VAR( ). 1 Now consider the lag selection criteria.

1 + p AIC : c (p) = ln ^ + 2 T p T 1 + p BIC : c (p) = ln ^ + ln T T p T 1 + p H-Q : c (p) = ln ^ + 2 ln ln T T p T

For all three criteria, the lag length for each equation becomes identical. However it is not hard to conjecture that single equation criteria can be used for selecting individual lag length. Alternatively, GS method is also used here. However, there is no joint criteria available for GS method.

4.5 Finite Sample Properties

p Let b = i=1 bi where bi is de…ned in P yt = a + b1yt 1 + + bpyt p + et:    The …rst order bias is studied by Nicholls and Pope (1988) given by

1 2 E b^ b = C + O T T    where p 1 2 1 1 1 C = G I b0 + b0 I b0 + j (I jb0) (0) 2 3 j=1   X 4 5 and j are the eigenvalues of b, and G is the covariance and matrix of et Note that this formula is similar to panel VAR case which we will consider very soon.

12 4.6

There is no Granger Causality if

E (xt+s xt; xt 1; ) = E (xt+s xt; xt 1; ; yt; yt 1; ) for all s > 0: j    j      

Testing: Under the null hypothesis of no Granger Causality (y2t to y1t), all upper o¤-diagonal elements should be zero.

13 5 Process with Deterministic Time Trends

First consider a simple trend regression given by

yt = bt + "t where we assume 2 "t iidN 0;   Let derive the limiting distribution.  t" ^b = b + t t2 P Next, add a constant P

yt = a + bt + "t

Derive the limiting distributions of a^ and ^b:

1 a^ a 1 t "t = 02 ^ 3 2 31 0 2 1 0 1 b b P t Pt Pt"t @4 5 4 5A @ A @ A P P P 1 1=2 1=2 T 0 a^ a T 0 1 t "t = 0 3=2 1 02 ^ 3 2 31 0 3=2 1 0 2 1 0 1 0 T b b 0 T P t Pt Pt"t @ A @4 5 4 5A @ A @ P P A @ P A Consider another estimator given by

yT y1 "T "1 ^b = = b + T 1 T 1 Find the limiting distribution of ^b:

14 6 Univariate Processes with Unit Roots

yt = a + ut; ut = ut 1 + "t Then we have

yt = yt 1 + "t Let

yt = "1 + + "t    where

"t iidN (0; 1)  Then we have

yt N (0; t)  and

yt yt 1 = "t N (0; 1) 

yt ys N (0; t s)  Rewrite as a continuous time stochastic process such that

Standard : W ( ) is a continuous-time stochastic process, associatting each date t  2 [0; 1] with the scalar W (t) such that

1. W (0) = 0

2. [W (s) W (t)] N (0; s t)  3. W (1) N (0; 1) 

Transition from discrete to continuous. First let "t iidN (0; 1)  1 1 T 1 T y 1 " " + " 1 T y = t = 1 + 1 2 + + " p T t p T p T T T t T t=1 T t=1 T    t=1 ! X 1X X d W (r) dr ! Z0

1 T 1 y2 d W 2dr T 2 t t=1 ! 0 X Z 15 T 1 1 d tyt 1 rW dr T 5=2 ! t=1 0 X Z 2 If "t N 0;  ; then we have  1 3=2 d 2  T yt  W dr ! 0 X Z etc..

More Limiting distributions stubs. Let

2 yt = yt 1 + et; et iid 0;  ; y0 = Op (1)   as T ; we have ! 1

1=2 d 2 T et W (1) = N 0;  ! 1=2 X T yt 1et  ! 1 X Next, consider t 1 1 d 2 yt = es N 0;  p p ! t t s=1 X  so that 2 1 1 t y2 = e d N (0; 1)2 = 2 for a large t: 2t t 2t s 1 s=1 ! ! X Now we are ready to prove 1 d 1 2 2 T yt 1et  W (1) 1 ! 2 X h i

Proof: Consider …rst 2 2 2 2 yt = (yt 1 + et) = yt 1 + et + 2yt 1et so that we have 1 2 2 2 yt 1et = yt yt 1 et : 2 Taking average yields 

1 1 2 2 1 1 2 yt 1et = yT y0 et T 2 2 T X  X Now let y0 = 0; then we have

1 1 2 1 1 2 d 1 2 2 1 2 1 2 2 yt 1et = yT et  1  =  W (1) 1 : T 2 2 T ! 2 2 2 X X h i

16 Next, consider

1 1 yt 1 = (e1 + (e1 + e2) + + (e1 + + eT 1)) T 3=2 T 3=2       X 1 = ((T 1) e1 + (T 2) e2 + + eT 1) T 3=2    1 T 1 T 1 T = (T t) et = et tet: T 3=2 T 1=2 T 3=2 t=1 t=1 t=1 X X X Now we are ready to prove 1 3=2 d T tet W (1)  W (r) dr ! 0 X Z Proof: T 3=2 1 1 d T tet = et yt 1 W (1)  W (r) dr: T 1=2 T 3=2 t=1 ! X X X Z Questions:

1. Consider this 2 yt = yt 1 + et; ut iid 0; u : Eetus = 0 for all t; s:  Then derive the limiting distribution of 

1 T yt 1ut X 2. Prove the followings

5=2 d (a) T tyt 1  rW (r) dr ! 3 2 d R 2 (b) T Ptyt 1  rW (r) dr ! P R

6.1 Limiting Distribution of Unit Root Process I (No constant)

Consider the simple AR(1) regression without constant,

yt = yt 1 + et:

Then OLS estimator is given by yt 1et ^  = 2 yt 1 P When  = 1; we know P

1 d 1 2 2 yt 1et  W (1) 1 ; T ! 2 T h i X1 1 y2 d2 W 2dr: T 2 t t=1 ! 0 X Z 17 Hence we have

1 1 1 yt 1et d 1 2 1 T 2 2 T (^ 1) = 1 2 W dr W (1) 1 = W dr W dW: T 2 yt 1 ! 2 P Z   h i Z  Z Consider t-ratio givenP by ^ 1 ^ 1 t  = = 1=2 ^T 2 2 T yt 1 where p P  2 1 2 p 2 T = (yt y^ t 1)  T ! X Hence we have 1=2 ^ 1 d 2 1 2 t = W dr W (1) 1 2 2 1=2 ! 2 T yt 1     Z h i Note that the upper and lowerP 2.5 (5.0) % of critical values are -2.23 (-1.95) and 1.62 (1.28) which are very di¤erent from 1.96 (1.65).

6.2 Limiting Distribution of Unit Root Process I (constant)

Now we have

yt = a + yt 1 + et When  = 1; a = 0: Howeve we don’tknow if  = 1 or not. Under the null of unit root, the OLS estimators are given by 1 a^ 0 T yt 1 et = 2 3 2 2 3 2 3 ^ 1 yt 1 P yt 1 Pyt 1et Consider 4 5 4 P P 5 4 P 5 1 1 1 pT 0 a^ 0 1 3=2 yt 1 et = T pT 2 3 2 3 2 1 1 2 3 2 1 3 0 T ^ 1 T 3=2 yt 1 T 2 Pyt 1 T yPt 1et 4 5 4 5 4 5 4 1 5 P P P W (1) d 1  W (r) dr 2 1 2 ! 2  W (r) dr 2 R W (r) dr 3 2 2 W (1) 1 3 2 4 5 4 h i 5 Hence R R 2 1 W (1) 1 W (1) W dr ~ d 2 W dW T (^ 1) 2 = ! h W 2dr i W dr R W~ 2dr R Similarly t ratio statistic is given by R R R  1 W (1)2 1 W (1) W dr d 2 t : ! h i 2 1=2 W 2dr W dr R nR R  o Exercise: Derive the limiting distribution of trend case.

18 6.3 Unit Root Test

For AR(p), we have

yt = a + yt 1 + jyt j + et X Note that 1 1 yt j = Op ; T pT X   so that as T ; the augmented term goes away. Hence the limiting distribution does not change at all. ! 1

7 Meaning of Nonstationary

Let’s…nd an economic meaning of nonstationarity.

1. No steady state. No static mean or average exists. A series becomes random around its true mean. Never converge to its mean.

2. No equilibrium since there is no steady state. Cannot forecast or predict its future value without considering other nonstationary variable.

3. Fast convergence rate.

7.1 Unit Root Test and Stationarity Test

Note that the rejection of the null of unit root does not imply that a series is stationary. To see this, let

2 xt = xt 1 + "t;"t = ptet; et iid 0;  :   Further let  = 0: Now xt is not weakly stationary since its variance is time varying. However at the same time, xt does not follow unit root process since  = 0: To see this, let derive the limiting distribution of :^

xt 1"t ^  = 2 xt 1 P and P

2 2 2 E xt 1"t = E "t 1"t = E tet 1et 2 X  X2 T  X  = 2 + O (T ) 2  so that 4 1 d  xt 1"t N 0; T ! 2 X  

19 2 2 2 2 p 2 T xt 1 = "t 1 = tet 1  + O (T ) ! 2 X X X Hence we have 1 xt 1"t d T p T (^ ) = 1 2 N (0; 2) = 2W (1) : T 2 xt 1 ! P Therefore, we can see that the convergence rateP is still T; but the limiting distribution is not a function of Brownian motion at all.

8 Unit Root Test Considering Finite Sample Bias

Consider the following recursive mean adjustment

t 1 1 y = y t t 1 s s=1 X Under the null of unit root, we have

yt yt = a +  (yt yt) + ( 1)y t + et:

Since a = 0 and  = 1; we have

yt yt =  (yt yt) + et:

The limiting distribution of ^RD is given by

r 1 r d 1 1 T (^ 1) W dr r W (r) dr W dW r W dW RD ! Z Z0  Z Z0  Finite sample performance is usually better than ADF test.

Exercise Read “Uniform Asymptotic Normality in Stationary and Unit Root Autoregression” by Han, Phillips and Sul (2010) and construct X-di¤erencing Unit root test.

20 9 Weak Stationary and Local to Unity

Joon Park (2007) and Phillips and Magdalinos (2007, JoE)

Consider the following DGP

yt = nyt 1 + ut; t = 1; :::; n; where c  = 1 ; 0 < 1 and c > 0 n n  Then we have pn (^  ) d N 0; 1 2 n n ! n so that  pn (^  ) d N (0; 1) : 1 2 n n ! n Note that p 2 2 2 c 2c c 2c 2 1  = 1 1 + + = + = O n + O n ; n n2 n n2 n hence  

2 2c c 1 n = 1 + ; rn 2n p   and pn 1 = n1=2n =2 + O n +1=2 : 1 2 p2c n   Finally we have p +1 d n 2 (^  ) N (0; 2c) n n ! For the case of = 1; we call it local to unity of which limiting distribution is di¤erent. (Phillips 1987, Biometrica, 88 Econometrica). Consider the following simple DGP

c c yt = nyt 1 + ut; n = exp 1 + T ' T   Now, de…ne r (r s)c J (r) = e dW (s) Z0 where J (r) is a which for …xed r > 0; has the distribution

1 e2rc 1 J (r) N 0; ;  2 c   and it call Ornstein-Uhlenbeck process. Alternatively we have

r (r s)c J (r) = W (r) + c e W (s) ds: Z0

21 The limiting distribution of ^n is given by

2 1 1  n (^  ) d JdW + 1 u J 2dr n n ! 2 2 Z   Z  where  is the long run variance of ut: Now we have

nn = n + c so that 2 1 1  n (^ 1 c) d JdW + 1 u J 2dr n ! 2 2 Z   Z  2 2 and let u =  (for AR(1) case), then

1 n (^ 1) d c + JdW J 2dr for c < 0 n ! Z  Z  See Phillips for the case of c : ! 1

Explosive Series c  = 1 + ; for c > 0 n n

As n ;  1 but in the …xed n;  > 1: Note that if  > 1 but yo = 0; then the limiting distribution ! 1 n ! n (done by White, 1958) is given by

n (^ ) d C as n 2 1 ! ! 1 where C is a Cauchy distribution. From this, consider

c c2 2 1 = 2 + n n n2 so that n n n = n = nn =2c 2 1 2cn n n Hence we have (nn =2c) (^  ) d C: n n n ! Note: White considered momenting generating function …rst and convert it to pdf. Read "Explosive Behavior in the 1990s Nasdaq: When Did Exuberance Escalate Asset Values?,”by Peter C. B. Phillipsy, Yangru Wuz, Jun Yu 2009.

22 10

10.1 Multivariate Integrated Process (Chap 18)

Let

yt= yt 1+ut where yt is a vector of nonstationary process. We further assume that

1 ut = s"t s s=1 X s where 1 s < . Let E ("t"0 ) = ; then s=0 ij 1 t P 1 s = E utut0 s = s+v v0 : v=0  X Further de…ne

= PP0 and

(1) = + + ; 0 1     = (1) P

Then we have

1=2 d T ut W (1) ! X 23 1 d 1 T yt 1ut0  WdW 0 + v0 ! v=1 X Z  X 3=2 d T yt 1yt0 1  WW0dr 0 ! X Z  We are ready to analyze time series regression with integrated processes. Consider

yt = xt + ut; ut = ut 1 + "t

Then we have 1 2 d ^ = x xtut ? t !   X  X  To …nd out this, we can choose two ways. The …rst way is to de…ne the bivariate process of yt and xt: By doing this, we can know what stands for. See Hamilton p. 558-559. The second way is just considering the following results. Let

zt = (xt; ut)0 ; then

3=2 2 3=2 2 3=2 T xt T xtut d 11 W1 12 W1W2 T ztzt0 = 0 3=2 3=2 2 1 ! 0 2 1 T Pxtut T P ut 12 RW1W2 22R W2 X @ A @ A where  can be de…ned as P P R R

zt= (L) "t and

 = (1) P; and 0= :

3=2 2 Now consider the rate of convergence. Since ut is I(1) ; xtut is needed to divide by T and xt as well. Hence the rate of convergence is 1. What does it meanP then? P

10.2 Common Factor across Variables

Consider a process given by

yt = t + ut; t = t 1 + mt; ut iidN (0; 1) : 

Note that yt is I (1) : Next, consider a similar process of which contains t:

xt = t + "t;"t iidN (0; 1) :  Then we have

yt xt = ut "t = I (0) :

In this case, we say yt is cointegrated with xt:

24 10.2.1 When common factor is a linear trend

Part I Before we proceed a formal asymptotic in this case, we will enjoy the following simple trend regression case. Consider

yt = bt + zt; xt = t + st where

zt iidN (0; 1) ; st iidN (0; 1) and Eztsm = 0 for all t and m:   Q1: If you are running

yt = xt + "t then identify and "t:

yt = bt + zt = b (t + st) + zt bst = bxt + "t:

Q2: Is xt correlated with "t?

E (xt"t) = E (t + st)(zt bst) = b = 0: 6

Q3: is ^ consistent? x " ^ = b + t t : x2 P t 2 Consider xt …rst. P P 1 1 1 1 1 x2 = t2 + 2ts + s2 = + O + O T 3 t T 3 t t 3 p T p T 2     X X  Next 1 1 1 1 x " = (t" + s " ) = O (1) + O + O T 3=2 t t T 3=2 t t t p p T p T 3=2 X X     Hence we have 3=2 3=2 ^ T xt"t d T b = 3 2 N (0;V ) T xt !   P P Part II Now we have

xt = att = (1 + st) t = t + stt:

Q1. And you are running

yt = xt + "t then identify and "t:

yt = bt + zt = b (t + stt) bstt + zt

= bxt + "t

25 Q2: Is xt correlated with "t?

2 E (xt"t) = E (t + st)(zt bstt) = Ebs t = bt = 0 t 6 Q3: is ^ consistent?

1 1 1 E x2 = E t2 (1 + s )2 = E t2 1 + s2 + 2s T 3 t T 3 t T 3 t t X 1 X 1 1 X  = O (1) + O + O 3 T T 2     Next

E xt"t =? X Derive it.

10.2.2 When the common factors are stochastic.

Consider the following processes again

2 yt = t + ut; t = t 1 + mt; ut iidN (0; 1) ; mt iidN 0;  :    We assume further that

Emtus = 0 for all t and s:

Let’sderive the limiting distribution of

1  u : T 3=2 t t X First its mean is zero. Next,

2 2 2 2 2 E tut = E 1u1 + ::: + T uT X   2 = 2 (1 + 2 + ::: + T ) = T 2 + O (T ) 2

Hence we have 2 1 d  tut N 0; : T ! 2 X   Now consider

yt = t + ut; xt = t + et;

Then Q1: If you are running

yt = xt + "t then identify and "t:

yt = t + ut = (t + et) + ut et = bxt + "t:

26 Q2: Is xt correlated with "t?

E (xt"t) = E (t + et)(ut et) = = 0: 6

Q3: is ^ consistent? x " ^ = + t t : x2 P t 2 Consider xt …rst. P 1 2 d 2 P xt W dr T 2 ! X Z Next 1 d xt"t N (0;V ) T ! X Find V: So that we have T ^ d N (0;Q) !   Find Q:

27 11 Cointegration Test

Consider a simple case

yt = xt + ut; ut = ut 1 + "t where xt is independent from ut: Let

u^t = yt x^ t and run

u^t = u^t 1 + et Then 1 u^t 1et T Z = T (^ 1) = 1 2 T 2 u^t 1 P Note that P xtyt u^t = yt xt x2 P t  Let P 1 Q (r) = Wy (r) WyW 0 WxW 0 Wx (r) x x Z Z  Then we have 1 1 2 d 2 2 u^t 1 Q (r) dr T ! 0 X Z and 1 1 d u^t 1et Q (r) dR T ! 0 X Z Hence …nally 1 1 d 2 Z Q (r) dR Q (r) dr : ! Z  Z0  Note htat Q (r) is depending on the value of ^. Intutitively, we can say that we have more regressors, the variability of ^ will increase so that the quantity of Q (r) is also increasing. That is, for two regressors case, we have

u^t = yt ^ x1t ^ x2t = ^ x1t + ^ x2t: 1 2 1 1 2 2     Therefore, the limiting distribution is depending on the number of regressors. As more regressors enters, the critical value is getting larger.

12 (ECM)

Here I introduce ECM in an intuitive way. Consider the case of cointegration …rst. Then we have

ut = ut 1 + et

28 or

ut = ( 1) ut 1 + et or

yt xt = ( 1) (yt 1 xt 1) + et: This regression model can be further decomposed into

yt = 1 (yt 1 xt 1) + e1t

xt = 2 (yt 1 xt 1) + e2t and

1 2 =  1: Note that the lagged term is call ‘error correction’term, and so it call error correction model.

If ut follows AR(p), then the general ECM is given by

p 1 p 1 y x yt = 1 (yt 1 xt 1) + j yt j + j xt j + e1t; j=1 j=1 X X p 1 p 1 y x xt = 1 (yt 1 xt 1) + j yt j + j xt j + e1t: j=1 j=1 X X

Now let’scompare no-cointegration case. If yt is not cointegrated with xt; we have

p 1 p 1 y x yt = j yt j + j xt j + e1t; j=1 j=1 X X p 1 p 1 y x xt = j yt j + j xt j + e1t: j=1 j=1 X X so that there is no error correction term in the VAR system.

13 Midterm Exam

1. Find AR order. 2. Test Granger Causality (assume y and x are stationary). Use bootstrap critical values. 3. Test unitroot. 4. Test cointegration. 5. Run ECM.

29