Time Series Econometrics A Concise Course
Francis X. Diebold University of Pennsylvania
February 17, 2020
1 / 357 Copyright c 2013-2020, by Francis X. Diebold.
All rights reserved.
This work is freely available for your use, but be warned: it is highly preliminary, significantly incomplete, and rapidly evolving. It is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. (Briefly: I retain copyright, but you can use, copy and distribute non-commercially, so long as you give me attribution and do not modify. To view a copy of the license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.) In return I ask that you please cite the book whenever appropriate, as: ”Diebold, F.X. (year here), Book Title Here, Department of Economics, University of Pennsylvania, http://www.ssc.upenn.edu/ fdiebold/Textbooks.html.”
The graphic is by Peter Mills and was obtained from Wikimedia Commons.
2 / 357 Time Domain: The General Linear Process and its Approximation
3 / 357 The Environment
Time series Yt (doubly infinite)
Realization yt (again doubly infinite)
Sample path yt , t = 1, ..., T
4 / 357 Strict Stationarity
Joint cdfs for sets of observations depend only on displacement, not time.
5 / 357 Weak Stationarity
(Covariance stationarity, second-order stationarity, ...)
Eyt = µ, ∀t
γ(t, τ) = E(yt − Eyt )(yt−τ − Eyt−τ ) = γ(τ), ∀t
0 < γ(0) < ∞
6 / 357 Autocovariance Function
(a) symmetric γ(τ) = γ(−τ), ∀τ
(b) nonnegative definite a0Σa ≥ 0, ∀a where Toeplitz matrix Σ has ij-th element γ(i − j)
(c) bounded by the variance γ(0) ≥ |γ(τ)|, ∀τ
7 / 357 Autocovariance Generating Function
∞ X g(z) = γ(τ) zτ τ=−∞
8 / 357 Autocorrelation Function
γ(τ) ρ(τ) = γ(0)
9 / 357 White Noise
2 White noise: εt ∼ WN(µ, σ )(serially uncorrelated)
2 Zero-mean white noise: εt ∼ WN(0, σ )
iid 2 Independent (strong) white noise: εt ∼ (0, σ )
iid 2 Gaussian white noise: εt ∼ N(0, σ )
10 / 357 Unconditional Moment Structure of Strong White Noise
E(εt ) = 0
2 var(εt ) = σ
11 / 357 Conditional Moment Structure of Strong White Noise
E(εt |Ωt−1) = 0
2 2 var(εt |Ωt−1) = E[(εt − E(εt |Ωt−1)) |Ωt−1] = σ
where
Ωt−1 = εt−1, εt−2, ...
12 / 357 Autocorrelation Structure of Strong White Noise
σ2, τ = 0 γ(τ) = 0, τ ≥ 1
1, τ = 0 ρ(τ) = 0, τ ≥ 1
13 / 357 An Aside on Treatment of the Mean
In theoretical work we assume a zero mean, µ = 0.
This reduces notational clutter and is without loss of generality.
(Think of yt as having been centered around its mean, µ, and note that yt − µ has zero mean by construction.)
(In empirical work we allow explicitly for a non-zero mean, either by centering the data around the sample mean or by including an intercept.)
14 / 357 The Wold Decomposition
Under regularity conditions, every covariance-stationary process {yt } can be written as:
∞ X yt = bi εt−i i=0
where:
b0 = 1
∞ X 2 bi < ∞ i=0
2 εt = [yt − P(yt |yt−1, yt−2, ...)] ∼ WN(0, σ )
15 / 357 The General Linear Process
∞ X yt = B(L)εt = bi εt−i i=0
2 εt ∼ WN(0, σ )
b0 = 1
∞ X 2 bi < ∞ i=0
16 / 357 Unconditional Moment Structure (Assuming Strong WN Innovations)
∞ ! ∞ ∞ X X X E(yt ) = E bi εt−i = bi Eεt−i = bi · 0 = 0 i=0 i=0 i=0
∞ ! ∞ ∞ X X 2 2 X 2 var(yt ) = var bi εt−i = bi var(εt−i ) = σ bi i=0 i=0 i=0
(Do these calculations use the strong WN assumption? If so, how?)
17 / 357 Conditional Moment Structure (Assuming Strong WN Innovations)
E(yt |Ωt−1) = E(εt |Ωt−1)+b1E(εt−1|Ωt−1)+b2E(εt−2|Ωt−1)+...
(Ωt−1 = εt−1, εt−2, ...)
∞ X = 0 + b1εt−1 + b2εt−2 + ... = bi εt−i i=1
2 var(yt |Ωt−1) = E[(yt − E(yt |Ωt−1)) |Ωt−1]
2 2 2 = E(εt |Ωt−1) = E(εt ) = σ
(Do these calculations use the strong WN assumption? If so, how?)
18 / 357 Autocovariance Structure
" ∞ ! ∞ !# X X γ(τ) = E bi εt−i bhεt−τ−h i=−∞ h=−∞
∞ 2 X = σ bi bi−τ i=−∞
(where bi ≡ 0 if i < 0)
g(z) = σ2 B(z) B(z−1)
19 / 357 Approximating the Wold Representation, I: Finite-Ordered Autoregressions
AR(p) process
(Stochastic difference equation)
We now study AR(1) (heavy detail)
20 / 357 Approximating the Wold Representation II: Fintite-Ordered Moving Average Processes
MA(q) process (Obvious truncation)
We now study MA(1) (light detail)
21 / 357 Wiener-Kolmogorov Prediction
yt = εt + b1 εt−1 + ...
yT +h = εT +h + b1 εT +h−1 + ... + bhεT + bh+1εT −1 + ...
Project on ΩT = {εT , εT −1, ...} to get:
yT +h,T = bh εT + bh+1 εT −1 + ...
Note that the projection is on the infinite past
22 / 357 Wiener-Kolmogorov Prediction Error
h−1 X eT +h,T = yT +h − yT +h,T = bi εT +h−i i=0
(An MA(h − 1) process!)
E(eT +h,T ) = 0
h−1 2 X 2 var(eT +h,T ) = σ bi i=0
23 / 357 Wold’s Chain Rule for Autoregressions
Consider an AR(1) process: yt = φyt−1 + εt
History: T {yt }t=1
Immediately, yT +1,T = φyT 2 yT +2,T = φyT +1,T = φ yT . . h yT +h,T = φyT +h−1,T = φ yT
Extension to AR(p) and AR(∞) is immediate.
24 / 357 Multivariate (Bivariate) 0 0 Define yt = (y1t , y2t ) and µt = (µ1t , µ2t )
yt is covariance stationary if:
E(yt ) = µt = µ, ∀ t 0 where µ = (µ1, µ2) is a vector of constants “mean does not depend on time”
0 Γy1y2 (t, τ) = E(yt − µt )(yt+τ − µt+τ ) γ (τ) γ (τ) = 11 12 , τ = 0, 1, 2, ... γ21(τ) γ22(τ) “autocovariance depends only on displacement, not on time”
var(y1) < ∞, var(y2) < ∞ “finite variance”
25 / 357 Cross Covariances
Cross covariances not symmetric in τ:
γ12(τ) 6= γ12(−τ)
Instead:
γ12(τ) = γ21(−τ)
0 Γy1y2 (τ) = Γy1y2 (−τ), τ = 0, 1, 2, ...
Covariance-generating function:
∞ X τ Gy1y2 (z) = Γy1y2 (τ)z τ=−∞
26 / 357 Cross Correlations
−1 −1 Ry1y2 (τ) = Dy1y2 Γy1y2 (τ) Dy1y2 , τ = 0, 1,, 2, ...
σ 0 D = 1 0 σ2
27 / 357 The Multivariate General Linear Process
∞ X yt = B(L) εt = Bi εt−i i=0
Σ if t = s E(ε ε0 ) = t s 0 otherwise
B0 = I ∞ X 2 kBi k < ∞ i=0
Bivariate case: y B (L) B (L) ε 1t = 11 12 1t y2t B21(L) B22(L) ε2t
28 / 357 Autocovariance Structure
∞ X 0 Γy1y2 (τ) = Bi Σ Bi−τ i=−∞
(where Bi ≡ 0 if i < 0)
0 −1 Gy (z) = B(z)Σ B (z )
29 / 357 Wiener-Kolmogorov Prediction
yt = εt + B1εt−1 + B2εt−2 + ...
yT +h = εT +h + B1εT +h−1 + B2εT +h−2 + ...
Project on Ωt = {εT , εT −1, ...} to get:
yt+h,T = BhεT + Bh+1εT −1 + ...
30 / 357 Wiener-Kolmogorov Prediction Error
h−1 X εT +h,T = yT +h − yT +h,T = Bi εT +h−i i=0
E[εT +h,T ] = 0 h−1 0 X 0 E[εT +h,T εT +h,T ] = Bi ΣBi i=0
31 / 357 Vector Autoregressions (VAR’s)
N-variable VAR of order p:
Φ(L)yt = εt
εt ∼ WN(0, Σ) where:
p Φ(L) = I − Φ1L − ... − ΦpL
32 / 357 Bivariate VAR(1) in “Long Form”
y φ φ y ε 1t = 11 12 1t−1 + 1t y2t φ21 φ22 y2t−1 ε2t
2 ! ε1t 0 σ1 σ12 ∼ WN , 2 ε2t 0 σ12 σ2
– Two sources of cross-variable interaction.
33 / 357 MA Representation of a VAR
Φ(L)yt = εt
−1 yt = Φ (L)εt = Θ(L)εt
where:
2 Θ(L) = I + Θ1L + Θ2L + ...
34 / 357 MA Representation of Bivariate VAR(1) in Long Form
1 0 φ φ y ε − 11 12 L 1t = 1t 0 1 φ21 φ22 y2t ε2t
1 1 y1t ε1t θ11 θ12 ε1t−1 = + 1 1 + ... y2t ε2t θ21 θ22 ε2t−1
35 / 357 Understanding VAR’s: Granger-Sims Causality
Is the history of yj useful for predicting yi , over and above the history of yi ?
– Granger non-causality corresponds to exclusion restrictions
– In the simple 2-Variable VAR(1) example,
y φ φ y ε 1t = 11 12 1t−1 + 1t , y2t φ21 φ22 y2t−1 ε2t
y2 does not Granger cause y1 iff φ12 = 0
36 / 357 Understanding VAR’s: Impulse Response Functions (IRF’s)
p (I − Φ1L − ... − ΦpL )yt = εt
εt ∼ WN(0, Σ)
The impulse-response question: How is yit dynamically affected by a shock to yjt (alone)?
(N × N matrix of IRF graphs (over steps ahead))
Problem: Σ generally not diagonal, so how to shock j alone?
37 / 357 16 CHAPTER 1. VECTOR AUTOREGRESSIONS
475 18 R 450 16 14 425 12 400 10 375 8 6 350 4 325 P 2 300 0 1959 1965 1971 1977 1983 1989 1995 2001 1959 1965 1971 1977 1983 1989 1995 2001
920 900 Y M 900 850 880 800 860 750 840 700 820 800 650 780 600 760 550 1959 1965 1971 1977 1983 1989 1995 2001 1959 1965 1971 1977 1983 1989 1995 2001
Graphic: IRF Matrix for 4-Variable U.S. Macro VAR
Response of Y Response of P Response of R Response of M 1.0 2.00 0.84 2.1 0.8 1.75 0.70 1.8 1.50 0.6 0.56 1.5 1.25 1.2 0.4 1.00 0.42 0.9 0.2 0.75 0.28 Y 0.6 0.0 0.50 0.14 0.25 0.3 •0.2 0.00 0.00 0.0 •0.4 •0.25 •0.14 •0.3 0 10 20 30 0 20 0 20 0 10 20 30
0.2 1.50 0.6 1.75 0.1 1.25 0.5 1.50 •0.0 1.00 0.4 1.25 0.3 1.00 •0.1 0.75 0.2 0.75 •0.2 0.50 P 0.1 0.50 •0.3 0.25 0.0 0.25 •0.4 0.00 •0.1 0.00 •0.5 •0.25 •0.2 •0.25 0 10 20 30 0 20 0 10 20 30 0 20
0.50 0.18 1.00 0.4 •0.00 0.2 0.25 0.75 •0.18 •0.0 0.50 •0.2 0.00 •0.36 Shock to Shock 0.25 •0.4 •0.25 •0.54 R 0.00 •0.6 •0.72 •0.8 •0.50 •0.90 •0.25 •1.0 •0.75 •1.08 •0.50 •1.2 0 20 0 20 0 20 0 10 20 30
0.6 1.8 0.4 1.8 0.5 1.6 1.6 0.4 1.4 0.3 0.3 1.2 1.4 0.2 1.0 0.2 1.2 M 0.1 0.8 0.1 1.0 •0.0 0.6 0.8 •0.1 0.4 0.0 •0.2 0.2 0.6 •0.3 0.0 •0.1 0.4 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 Response of Y Response of P Response of R Response of M
Figure 1.1: VAR evidence on US data 38 / 357 Understanding VAR’s: Variance Decompositions (VD’s)
p (I − Φ1L − ... − ΦpL )yt = εt
εt ∼ WN(0, Σ)
The variance decomposition question: How much of the h-step ahead (optimal) prediction-error variance of yi is due to shocks to variable j?
(N × N matrix of VD graphs (over h). Or pick an h and examine the N × N matrix of VD numbers.)
Problem: Σ generally not diagonal, which makes things tricky, as the variance of a sum of innovations is not the sum of the variances in that case.
39 / 357 Orthogonalizing VAR’s by Cholesky Factorization (A Classic Identification Scheme)
Original:
p (I − Φ1L − ... − ΦpL )yt = εt , εt ∼ WN(0, Σ)
Equivalently:
p (I − Φ1L − ... − ΦpL )yt = Pvt , vt ∼ WN(0, I )
or
−1 −1 −1 p (P − [P Φ1]L − ... − [P Φp]L )yt = vt , vt ∼ WN(0, I )
where Σ = PP0, for lower-triangular P (Cholesky factorization)
Now we can proceed with IRF’s and VD’s.
40 / 357 IRF’s and VD’s from the Orthogonalized VAR
IRF comes from the orthogonalized moving-average representation:
2 yt = (I + Θ1L + Θ2L + ...) P vt
2 = (P + Θ1PL + Θ2PL + ...) vt
IRFij is {Pij , (Θ1P)ij , (Θ2P)ij , ...}
VDij comes similarly from the orthogonalized moving-average representation.
41 / 357 Bivariate IRF Example (e.g., IRF12)
yt = Pvt + Θ1Pvt−1 + Θ2Pvt−2 + ...
vt ∼ WN(0, I )
yt = C0vt + C1vt−1 + C2vt−2 + ...
0 0 1 1 y1t c11 c12 v1t c11 c12 v1t−1 = 0 0 + 1 1 + ... y2t c21 c22 v2t c21 c22 v2t−1
0 1 2 IRF12 = c12, c12, c12, ...
0 (Q : What is c12?)
42 / 357 Bivariate VD Example (e.g., VD12 for h = 2)
t+2,t = C0vt+2 + C1vt+1
vt ∼ WN(0, I )
1 0 0 1 1 t+2,t c11 c12 v1t+2 c11 c12 v1t+1 2 = 0 0 + 1 1 t+2,t c21 c22 v2t+2 c21 c22 v2t+1
1 0 0 1 1 t+2,t = c11v1t+2 + c12v2t+2 + c11v1t+1 + c12v2t+1 1 0 2 0 2 1 2 1 2 var(t+2,t ) = (c11) + (c12) + (c11) + (c12) 0 2 1 2 Part coming from v2:(c12) + (c12)
(c0 )2 + (c1 )2 VD (2) = 12 12 12 0 2 0 2 1 2 1 2 (c11) + (c12) + (c11) + (c12)
43 / 357 Frequency Domain
“Spectral Analysis”
44 / 357 Remember...
z = a + bi (rectangular)
z = reiω = r(cos ω + i sin ω)(polar) eiω + e−iω cos(ω) = 2 eiω − e−iω sin(ω) = 2 2π P = ω zz¯ = |z|2
45 / 357 Recall the General Linear Process
∞ X yt = B(L)εt = bi εt−i i=0
Autocovariance generating function:
∞ X g(z) = γ(τ) zτ τ=−∞
= σ2B(z)B(z−1)
γ(τ) and g(z) are a z-transform pair
46 / 357 Spectrum
Evaluate g(z) on the unit circle, z = e−iω :
∞ X g(e−iω) = γ(τ) e−iωτ , − π < ω < π τ = −∞
= σ2B(eiω) B(e−iω)
= σ2 | B(eiω) |2
47 / 357 Spectrum
Trigonometric form:
∞ X g(ω) = γ(τ)e−iωτ τ=−∞
∞ X = γ(0) + γ(τ) eiωτ + e−iωτ τ=1
∞ X = γ(0) + 2 γ(τ) cos(ωτ) τ=1
48 / 357 Spectral Density Function
1 f (ω) = g(ω) 2π
∞ 1 X f (ω) = γ(τ)e−iωτ (−π < ω < π) 2π τ=−∞
∞ 1 1 X = γ(0) + γ(τ) cos(ωτ) 2π π τ=1
σ2 = B eiω B e−iω 2π
σ2 = | B eiω |2 2π
49 / 357 Properties of Spectrum and Spectral Density
1. symmetric around ω = 0
2. real-valued
3.2 π-periodic
4. nonnegative
50 / 357 A Fourier Transform Pair
∞ X g(ω) = γ(τ)e−iωτ τ=−∞
1 Z π γ(τ) = g(ω)eiωτ dω 2π −π
51 / 357 A Variance Decomposition by Frequency
1 Z π γ(τ) = g(ω)eiωτ dω 2π −π
Z π = f (ω)eiωτ dω −π
Hence Z π γ(0) = f (ω)dω −π
52 / 357 White Noise Spectral Density
yt = εt
2 εt ∼ WN(0, σ )
σ2 f (ω) = B eiω B e−iω 2π σ2 = 2π
53 / 357 AR(1) Spectral Density
yt = φyt−1 + εt
2 εt ∼ WN(0, σ )
σ2 f (ω) = B(eiω)B(e−iω) 2π σ2 1 = 2π (1 − φeiω)(1 − φe−iω) σ2 1 = 2π 1 − 2φ cos(ω) + φ2
How does shape depend on φ? Where are the peaks?
54 / 357 Figure: Granger’s Typical Spectral Shape of an Economic Variable
55 / 357 Robust Variance Estimation
T 1 X y¯ = y T t t=1
T T 1 X X var(¯y) = γ(t − s) T 2 s=1 t=1 (“Add row sums”)
T −1 1 X |τ| = 1 − γ(τ) T T τ=−(T −1)
(“Add diagonal sums,” using change of variable τ = t − s)
56 / 357 Robust Variance Estimation, Continued
T −1 √ X |τ| ⇒ T (¯y − µ) ∼ 0, 1 − γ(τ) T τ=−(T −1)
d ∞ ! √ X T (¯y − µ) → N 0, γ(τ) τ=−∞
√ d T (¯y − µ) → N (0, g(0))
57 / 357 Estimation: Sample Spectral Density
T −1 1 X fˆ(ω) = γˆ(τ)e−iωτ , 2π τ=−(T −1)
where
T 1 X γˆ(τ) = (y − y¯)(y − y¯) T t t−τ t=1
2πj T (Use frequencies ωj = T , j = 0, 1, 2, ..., 2 )
– Inconsistent, unfortunately
58 / 357 Historical and Computational Note: The FFT and Periodogram
T −1 1 X fˆ(ω) = γˆ(τ)e−iωτ 2π τ=−(T −1)
T ! T ! 1 X −iωt 1 X iωt = √ yt e √ yt e 2πT t=1 2πT t=1 (FFT) · (FFT)
1 = I (ω)(I is the “periodogram”) 4π
– Again: Inconsistent, unfortunately
59 / 357 Properties of the Sample Spectral Density
Under conditions: ˆ I f (ωj ) asymptotically unbiased
ˆ I But var(f (ωj )) does not converge to 0 (d.f. don’t accumulate)
ˆ I Hence f (ωj ) is inconsistent. What’s true is that:
ˆ d 2f (ωj ) 2 → χ2, f (ωj )
2 where the χ2 random variables are uncorrelated across 2πj T frequencies ωj = T , j = 0, 1, 2, ..., 2
60 / 357 Consistent (“Lag Window”) Spectral Estimation
T −1 1 X fˆ(ω) = γˆ(τ)e−iωτ 2π τ=−(T −1)
T −1 1 X f ∗(ω) = λ(τ)ˆγ(τ)e−iωτ 2π τ=−(T −1) (Weight the sample autocovariances.)
Common “lag windows” with “truncation lag” MT :
λ(τ) = 1, |τ| ≤ MT and 0 otherwise (rectangular, or uniform) |τ| λ(τ) = 1 − , τ ≤ MT and 0 otherwise MT (triangular, or Bartlett, or Newey-West)
MT Consistency: MT → ∞ and T → 0 as T → ∞
61 / 357 Consistent (“Autoregressive”) Spectral Estimation
“Model-based estimation”
Fit AR(pT ) model (using AIC, say)
Calculate spectrum of the fitted model at the fitted parameters
pT Consistency: pT → ∞ and T → 0 as T → ∞
62 / 357 Multivariate Frequency Domain
Covariance-generating function:
∞ X τ Gy1y2 (z) = Γy1y2 (τ)z τ=−∞
Spectral density function: 1 F (ω) = G (e−iω) y1y2 2π y1y2 ∞ 1 X = Γ (τ) e−iωτ , − π < ω < π 2π y1y2 τ=−∞ (Complex-valued)
63 / 357 Consistent Multivariate Spectral Estimation
Spectral density matrix:
∞ 1 X F (ω) = Γ (τ)e−iωτ , −π < ω < π y1y2 2π y1y2 τ=−∞
Consistent (lag window) estimator:
(T −1) 1 X F ∗ (ω) = λ(τ)Γˆ (τ) e−iωτ , −π < ω < π y1y2 2π y1y2 τ=−(T −1)
Or do autoregressive (VAR) spectral estimation.
64 / 357 Co-Spectrum and Quadrature Spectrum
Fy1y2 (ω) = Cy1y2 (ω) + iQy1y2 (ω)
∞ 1 X C (ω) = Γ (τ) cos(ωτ) y1y2 2π y1y2 τ=−∞ ∞ −1 X Q (ω) = Γ (τ) sin(ωτ) y1y2 2π y1y2 τ=−∞
65 / 357 Cross Spectrum
fy1y2 (ω) = gay1y2 (ω)exp(i phy1y2 (ω)) (generic cross spectrum)
1 2 2 2 gay1y2 (ω) = [cy1y2 (ω) + qy1y2 (ω)] (gain) qy1y2 (ω) phy1y2 (ω) = arctan (phase shift in radians) cy1y2 (ω)
ph(ω) (Phase shift in time units is ω )
2 |fy1y2 (ω)| cohy1y2 (ω) = (coherence) fy1y1 (ω)fy2y2 (ω)
Squared correlation decomposed by frequency
66 / 357 Useful Spectral Results for Filter Analysis
If y1t = B(L) y2t , then:
−iω 2 fy1y1 (ω) = |B(e )| fy2y2 (ω) −iω fy1y2 (ω) = B(e ) fy2y2 (ω) B(e−iω) is the filter’s frequency response function ———————————————————————————– If y1t = A(L)B(L)y2t , then:
−iω 2 −iω 2 fy1y1 (ω) = |A(e )| |B(e )| fy2y2 (ω) −iω −iω fy1y2 (ω) = A(e )B(e )fy2y2 (ω) ———————————————————————————- PN If y = i=1 yi , and the yi are independent, then: N X fy (ω) = fyi (ω) i=1
67 / 357 Nuances...
Note that
f (ω) B(e−iω) = y1y2 fy2y2 (ω)
i phy1y2 (ω) −iω gay1y2 (ω)e gay1y2 (ω) i ph (ω) =⇒ B(e ) = = e y1y2 fy2y2 (ω) fy2y2 (ω)
−iω Gains of fy1y2 (ω) and B(e ) are closely related.
Phases are the same.
68 / 357 Filter Analysis Example I: A Simple (but Very Important) High-Pass Filter
y1t = (1 − L)y2t = y2t − y2,t−1
B(L) = (1 − L) =⇒ B(e−iω) = 1 − e−iω
Hence the filter gain is:
−iω −iω B(e ) = 1 − e = 2(1 − cos(ω))
69 / 357 Gain of the First-Difference Filter, B(L) = 1 − L
How would the gain look if the filter were B(L) = 1 + L rather than B(L) = 1 − L?
70 / 357 Filter Analysis Example II
y2t = .9y2,t−1 + ηt
ηt ∼ WN(0, 1)
y1t = .5y2,t−1 + εt
εt ∼ WN(0, 1)
where ηt and εt are orthogonal at all leads and lags
71 / 357 Spectral Density of y2
1 y = η 2t 1 − .9L t 1 1 1 =⇒ f (ω) = y2y2 2π 1 − .9e−iω 1 − .9eiω 1 1 = 2π 1 − 2(.9) cos(ω) + (.9)2 1 = 11.37 − 11.30 cos(ω)
Shape?
72 / 357 Spectral Density of y1
y1t = 0.5Ly2t + εt
(So B(L) = .5L and B(e−iω) = 0.5e−iω)
1 =⇒ f (ω) =| 0.5e−iω |2 f (ω) + y1y1 y2y2 2π 1 = 0.25f (ω) + y2y2 2π 0.25 1 = + 11.37 − 11.30 cos(ω) 2π
Shape?
73 / 357 Cross Spectrum Gain and Phase
B(L) = .5L
B(e−iω) = .5e−iω
−iω fy1y2 (ω) = B(e ) fy2y2 (ω) −iω = .5e fy2y2 (ω) −iω = (.5fy2y2 (ω)) e
.5 g (ω) = .5f (ω) = y1y2 y2y2 11.37 − 11.30 cos(ω)
Phy1y2 (ω) = −ω
(In time units, Phy1y2 (ω) = −1, so y1 leads y2 by -1)
74 / 357 Cross Spectrum Coherence
2 2 | fy1Y2 (ω) | .25fy2y2 (ω) .25fy2y2 (ω) Cohy1y2 (ω) = = = fy2y2 (ω)fy1y1 (ω) fy2y2 (ω)fy1y1 (ω) fy1y1 (ω)
1 1 .25 2π 1−2(.9) cos(ω)+.92 = 1 1 1 .25 2π 1−2(.9) cos(ω)+.92 + 2π 1 = 8.24 + 7.20 cos(ω)
Shape?
75 / 357 Filter Analysis Example III: Kuznets’ Filters
Low-frequency fluctuations in aggregate real output growth. Low-frequency “Kuznets cycle”: 20-year period
Filter 1 (moving average):
2 1 X y = y 1t 5 2,t−j j=−2
2 1 X sin(5ω/2) =⇒ B (e−iω) = e−iωj = 1 5 5sin(ω/2) j=−2
Hence the filter gain is:
−iω sin(5ω/2) B1(e ) = 5sin(ω/2)
76 / 357 Kuznets’ Filters, Continued
Figure: Gain of Kuznets’ Filter 1
77 / 357 Kuznets’ Filters, Continued
Filter 2 (fancy difference):
y3t = y2,t+5 − y2,t−5
−iω i5ω −i5ω =⇒ B2(e ) = e − e = 2sin(5ω)
Hence the filter gain is:
−iω B2(e ) = |2sin(5ω)|
78 / 357 Kuznets’ Filters, Continued
Figure: Gain of Kuznets’ Filter 2
79 / 357 Kuznets’ Filters, Continued
Composite gain:
−iω −iω sin(5ω/2) B1(e )B2(e ) = |2sin(5ω)| 5sin(ω/2)
80 / 357 Kuznets’ Filters, Continued
Figure: Composite Gain of Kuznets’ two Filters
81 / 357 Gains All Together for Comparison
Gain of Kuznet's Filter 1 (MA filter) 2.0 1.0 Gain p = 5 p = 2.5 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Frequency
Gain of Kuznet's Filter 2 (Difference filter) 2.0 p = 20 p = 6.66 p = 4 p = 2.86 p = 2.22 1.0 Gain p = 10 p = 5 p = 3.33 p = 2.5 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Frequency
Composite Gain of Kuznet's Filter 2.0
p = 15 ~ 20 1.0 Gain 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Frequency
82 / 357 So The Kuznets Procedure may Induce a Spurious “Long-Wave” Cycle
Let’s apply the Kuznets filter to WN:
εt ∼ WN(0, 1) 1 f (w) = ε 2π
ε˜t = BKuznets (L)εt , iw 2 fε˜t (w) = |BKuznets (e )| fε(w)
83 / 357 Compare the Two Spectra!
White Noise Spectral Density Filtered White Noise Spectral Density
p = 15 ~ 20 0.5 0.5 0.4 0.4 0.3 0.3 Spectrum Spectrum 0.2 0.2 0.1 0.1 0.0 0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Frequency Frequency
84 / 357 Markovian Structure, State Space, and the Kalman Filter
85 / 357 Part I: Markov Processes
86 / 357 Discrete-State, Discrete-Time Stochastic Process
{yt }, t = 0, 1, 2,...
Possible values (”states”) of yt : 1, 2, 3,...
First-order homogeneous Markov process:
Prob(yt+1 = j|yt = i, yt−1 = it−1,..., y0 = i0)
= Prob(yt+1 = j|yt = i) = pij
87 / 357 Transition Probability Matrix P
1-step transition probabilities:
[time (t + 1)]
p11 p12 ··· [time t] p21 p22 ··· ····· P ≡ ····· ·
P∞ pij ≥ 0, j=1 pij = 1
88 / 357 Chapman-Kolmogorov
m-step transition probabilities:
(m) pij = Prob(yt+m = j | yt = i)
(m) (m) Let P ≡ pij .
Chapman-Kolmogorov theorem:
P(m+n) = P(m)P(n)
Corollary: P(m) = Pm
89 / 357 Lots of Definitions...
(n) State j is accessible from state i if pij > 0, for some n.
Two states i and j communicate (or are in the same class) if each is accessible from the other. We write i ↔ j.
A Markov process is irreducible if there exists only one class (i.e., all states communicate).
(n) State i has period d if pii = 0 ∀n such that n/d 6∈ Z, and d is the greatest integer with that property. (That is, a return to state i can only occur in multiples of d steps.) A state with period 1 is called an aperiodic state.
A Markov process all of whose states are aperiodic is called an aperiodic Markov process.
90 / 357 ...And More Definitions
The first-transition probability is the probability that, starting in i, the first transition to j occurs after n transitions:
(n) fij = Prob(yn = j, yk 6= j, k = 1, ..., (n − 1)|y0 = i) Denote the eventual transition probability P∞ (n) from i to j by fij (= n=1 fij ).
State j is recurrent if fjj = 1 and transient otherwise.
Denote the expected number of transitions needed to return to P∞ (n) recurrent state j by µjj (= n=1 nfjj ).
A recurrent state j is: positive recurrent if µjj < ∞ null recurrent if µjj = ∞.
91 / 357 And One More Definition
The row vector π is called the stationary distribution for P if:
πP = π.
The stationary distribution is also called the steady-state distribution.
92 / 357 Theorem (Finally!)
Theorem: Consider an irreducible, aperiodic Markov process.
Then either:
(1) All states are transient or all states are null recurrent
(n) pij → 0 as n → ∞ ∀i, j. No stationary distribution.
or
(2) All states are positive recurrent.
(n) pij → πj as n → ∞ ∀i, j. {πj , j = 1, 2, 3, ...} is the unique stationary distribution. n π is any row of limn→∞ P .
93 / 357 Example
Consider a Markov process with transition probability matrix:
0 1 P = 1 0
Call the states 1 and 2.
We will verify many of our claims, and we will calculate the steady-state distribution.
94 / 357 Example, Continued
(a) Valid Transition Probability Matrix
pij ≥ 0 ∀i, j 2 2 X X p1j = 1, p2j = 1 j=1 j=1
(b) Chapman-Kolmogorov Theorem (for P(2))
0 1 0 1 1 0 P(2) = P · P = = 1 0 1 0 0 1
95 / 357 Example, Continued
(c) Communication and Reducibility
Clearly, 1 ↔ 2, so P is irreducible.
(d) Periodicity
State 1: d(1) = 2 State 2: d(2) = 2
(e) First and Eventual Transition Probabilities
(1) (n) f12 = 1, f12 = 0 ∀ n > 1 ⇒ f12 = 1 (1) (n) f21 = 1, f21 = 0 ∀ n > 1 ⇒ f21 = 1
96 / 357 Example, Continued
(f) Recurrence
Because f21 = f12 = 1, both states 1 and 2 are recurrent.
Moreover,
∞ X (n) µ11 = nf11 = 2 < ∞ (and similarly µ22 = 2 < ∞) n=1 Hence states 1 and 2 are positive recurrent.
97 / 357 Example, Continued
(g) Stationary Distribution
We will guess and verify.
Let π1 = .5, π2 = .5 and check πP = π:
0 1 (.5,.5) = (.5,.5). 1 0
Hence the stationary probabilities are 0.5 and 0.5.
Note that in this example we can not get the stationary n probabilities by taking limn→∞ P . Why?
98 / 357 Illustrations/Variations/Extensions: Regime-Switching Models
p 1 − p P = 11 11 1 − p22 p22
st ∼ P
yt = cst + φst yt−1 + εt 2 εt ∼ iid N(0, σst )
“Markov switching,” or “hidden Markov,” model
99 / 357 Illustrations/Variations/Extensions: Heterogeneous Markov Processes
p11,t p12,t ··· p21,t p22,t ··· Pt = . ····· ·····
e.g., Regime switching with time-varying transition probabilities:
st ∼ Pt
yt = cst + φst yt−1 + εt 2 εt ∼ iid N(0, σst )
Business cycle duration dependence: pij,t = gij (t) Credit migration over the cycle: pij,t = gij (cyclet ) General covariates: pij,t = gij (xt )
100 / 357 Illustrations/Variations/Extensions: Constructing Markov Processes with Useful Stationary Distributions
I Markov Chain Monte Carlo (e.g., Gibbs sampling) – Construct a Markov process from whose steady-state distribution we want to sample.
I Global Optimization (e.g., simulated annealing) – Construct a Markov process the support of whose steady-state distribution is the set of global optima of a function we want to maximize.
101 / 357 Illustrations/Variations/Extensions: Continuous-State Markov Processes AR(1)
αt = T αt−1 + ηt
ηt ∼ WN
102 / 357 Illustrations/Variations/Extensions: Continuous-State Markov Processes State-Space System
αt = T αt−1 + ηt
yt = Zαt + ζt
ηt ∼ WN, ζt ∼ WN
We will now proceed to study state-space systems in significant depth.
103 / 357 Part II: State Space
104 / 357 Transition Equation
αt = T αt−1 + R ηt mx1 mxm mx1 mxg gx1
t = 1, 2, ..., T
105 / 357 Measurement Equation
yt = Z αt + Γ wt + ζt 1x1 1xm mx1 1xL Lx1 1x1
(This is for univariate y. We’ll do multivariate shortly.)
t = 1, 2, ..., T
106 / 357 (Important) Details
η t ∼ WN 0, diag( Q , h ) ζt |{z} |{z} g×g 1×1
E(α0 ηt 0 ) = 0mxg
E(α0 ζt ) = 0mx1
107 / 357 All Together Now
αt = T αt−1 + R ηt mx1 mxm mx1 mxg gx1
yt = Z αt + Γ wt + ζt 1x1 1xm mx1 1xL Lx1 1x1
η t ∼ WN 0, diag( Q , h ) ζt |{z} |{z} g×g 1×1
E(α0 ζt ) = 0mx1 E(α0 ηt 0 ) = 0mxg
(Covariance stationary case: All eigenvalues of T inside |z| = 1)
108 / 357 Our Assumptions Balance Generality vs. Tedium
– Could allow time-varying system matrices
– Could allow exogenous variables in measurement equation
– Could allow correlated measurement and transition disturbances
– Could allow for non-linear structure
109 / 357 State Space Representations Are Not Unique
Transform by the nonsingular matrix B.
The original system is:
αt = T αt−1 + R ηt mx1 mxm mx1 mxg gx1
yt = Z αt + ζt 1x1 1xm mx1 1x1
110 / 357 State Space Representations Are Not Unique: Step 1
Rewrite the system in two steps
First, write it as:
−1 αt = T B B αt−1 + R ηt mx1 mxm mxm mxm mx1 mxg gx1
−1 yt = Z B B αt + ζt 1x1 1xm mxm mxm mx1 1x1
111 / 357 State Space Representations Are Not Unique: Step 2
Second, premultiply the transition equation by B to yield:
−1 (B αt ) = (BTB ) (B αt−1) + (BR) ηt mx1 mxm mx1 mxg gx1
−1 yt = (ZB ) (B αt ) + ζt 1x1 1xm mx1 1x1
(Equivalent State Space Representation)
112 / 357 AR(1) State Space Representation
yt = φ yt−1 + ηt
2 ηt ∼ WN(0, ση)
Already in state space form!
αt = φ αt−1 + ηt
yt = αt
2 (T = φ, R = 1, Z = 1, Q = ση, h = 0)
113 / 357 MA(1) in State Space Form
yt = ηt + θ ηt−1
2 ηt ∼ WN(0, ση)
α1t 0 1 α1,t−1 1 = + ηt α2t 0 0 α2,t−1 θ
yt = (1, 0) αt = α1t
114 / 357 MA(1) in State Space Form
Why? Recursive substitution from the bottom up yields:
yt αt = θηt
115 / 357 MA(q) in State Space Form
yt = ηt + θ1 ηt−1 + ... + θq ηt−q
2 ηt ∼ WN N(0, ση)
α1t 0 α1,t−1 1 α2t 0 Iq α2,t−1 θ1 = + η . . . . t . . . . 0 αq+1,t 0 0 αq+1,t−1 θq
yt = (1, 0, ..., 0) αt = α1t
116 / 357 MA(q) in State Space Form
Recursive substitution from the bottom up yields:
θqηt−q + ... + θ1ηt−1 + ηt yt . . . . αt ≡ = θqηt−1 + θq−1ηt θqηt−1 + θq−1ηt θqηt θqηt
117 / 357 AR(p) in State Space Form
yt = φ1 yt−1 + ... + φp yt−p + ηt
2 ηt ∼ WN(0, ση)
α1t φ1 α1,t−1 1 α2t φ2 Ip−1 α2,t−1 0 α = = + η t . . . . t . . . . 0 αpt φp 0 αp,t−1 0
yt = (1, 0, ..., 0) αt = α1t
118 / 357 AR(p) in State Space Form
Recursive substitution from the bottom up yields:
α1t φ1α1,t−1 + ... + φpα1,t−p + ηt . . . . αt = = αp−1,t φp−1α1,t−1 + φpα1,t−2 αpt φpα1,t−1
yt . . = φp−1yt−1 + φpyt−2 φpyt−1
119 / 357 ARMA(p,q) in State Space Form
yt = φ1yt−1 + ... + φpyt−p + ηt + θ1ηt−1 + ... + θqηt−q
2 ηt ∼ WN(0, ση)
Let m = max(p, q + 1) and write as ARMA(m, m − 1):
(φ1, φ2, ..., φm) = (φ1, ..., φp, 0, ..., 0)
(θ1, θ2, ..., θm−1) = (θ1, ..., θq, 0, ..., 0)
120 / 357 ARMA(p,q) in State Space Form
φ1 1 φ2 Im−1 θ1 α = α + η t . t−1 . t . . 0 φm 0 θm−1
yt = (1, 0, ..., 0) αt
121 / 357 ARMA(p,q) in State Space Form
Recursive substitution from the bottom up yields:
α1t φ1α1,t−1 + φpα1,t−p + ηt + θ1ηt−1 + ... + θqηt−q . . . . = αm−1,t φm−1α1,t−1 + αm,t−1 + θm−2ηt αmt φmα1,t−1 + θm−1ηt
yt . . = φm−1yt−1 + φmyt−2 + θm−1ηt−1 + θm−2ηt φmyt−1 + θm−1ηt
122 / 357 Multivariate State Space (Same framework, N > 1 observables)
αt = T αt−1 + R ηt mx1 mxm mx1 mxg gx1
yt = Z αt + ζt Nx1 Nxm mx1 Nx1
η t ∼ WN 0, diag( Q , H ) ζt |{z} |{z} g×g N×N
0 0 E(α0 ηt ) = 0mxg E(α0 ζt ) = 0mxN
123 / 357 N-Variable VAR(p)
yt = Φ1 yt−1 + ... + Φpyt−p + ηt Nx1 NxN Nx1
ηt ∼ WN(0, Σ)
124 / 357 State Space Representation
α1t Φ1 α1,t−1 IN α2t Φ2 IN(p−1) α2,t−1 0NxN = + η . . . . t . . . . α 0 α 0 pt Npx1 Φp 0 NpxNp p,t−1 Npx1 NxN NPxN
yt = (IN , 0N , ..., 0N ) αt Nx1 NxNp Npx1
125 / 357 N-Variable VARMA(p,q)
yt = Φ1 yt−1 + ... + Φp yt−p Nx1 NxN NxN
+ ηt + Θ1 ηt−1 + ... + Θq ηt−q NxN NxN
ηt ∼ WN(0, Σ)
126 / 357 N-Variable VARMA(p,q)
Φ1 I Φ2 IN(m−1) Θ1 αt = . αt−1 + . ηt . . Nmx1 Φm 0NxN(m−1) Θm−1
yt = (I , 0, ..., 0) αt = α1t
where m = max(p, q + 1)
127 / 357 Linear Regression
Transition:
αt = αt−1
Measurement:
0 yt = xt αt + ζt
0 2 (T = I , R = 0, Zt = xt , H = σζ )
Note the time-varying system matrix.
128 / 357 Linear Regression with Time-Varying Coefficients
Transition:
αt = φ αt−1 + ηt
Measurement:
0 yt = xt αt + ζt
0 2 (T = φ, R = I , Q = cov(ηt ), Zt = xt , H = σζ )
– Gradual evolution of tastes, technologies and institutions – Lucas critique – Stationary or non-stationary
129 / 357 Linear Regression with ARMA(p,q) Disturbances
yt = βxt + ut
ut = φ1 ut−1 + ... + φp ut−p + ηt + φ1 ηt−1 + ... + θq ηt−q
φ1 1 φ2 Im−1 θ1 α = α + η t . t−1 . t . . 0 φm 0 θm−1
yt = (1, 0, ..., 0)αt + βxt
where m = max(p, q + 1)
130 / 357 Signal + Noise Model “Unobserved Components”
yt = xt + ζt
xt = φ xt−1 + ηt 2 ζt σζ 0 ∼ WN 0, 2 ηt 0 ση
2 2 (αt = xt , T = φ, R = 1, Z = 1, Q = ση, H = σζ )
131 / 357 Cycle + Seasonal + Noise
yt = ct + st + ζt
ct = φ ct−1 + ηct
st = γ st−4 + ηst
132 / 357 Cycle + Seasonal + Noise
Transition equations for the cycle and seasonal:
αct = φ αc,t−1 + ηct
0 1 0 I3 0 αst = αs,t−1 + ηst 0 0 γ 00 0
133 / 357 Cycle + Seasonal + Noise
Stacking transition equations gives the grand transition equation:
0 1 0 0 0 1 0 0 0 1 0 0 0 0 α α η st = 0 0 0 1 0 s,t−1 + 0 0 st α α η ct γ 0 0 0 0 c,t−1 0 0 ct 0 0 0 0 φ 0 1
Finally, the measurement equation is: αst yt = (1, 0, 0, 0, 1) + ζt αct
134 / 357 Dynamic Factor Model – Single AR(1) Factor
(White noise idiosyncratic factors uncorrelated with each other and uncorrelated with the factor at all leads and lags...)
y1t λ1 ζ1t . . . . = . ft + . yNt λN ζNt
ft = φft−1 + ηt
Already in state-space form!
135 / 357 Dynamic Factor Model – Single ARMA(p,q) Factor
y1t λ1 ζ1t . . . . = . ft + . yNt λN ζNt
Φ(L) ft = Θ(L) ηt
136 / 357 Dynamic Factor Model – Single ARMA(p,q) Factor
State vector for f is state vector for system.
System transition:
φ1 1 φ2 Im−1 θ1 α = α + η t . t−1 . t . . 0 φm 0 θm−1
137 / 357 Dynamic Factor Model – Single ARMA(p,q) Factor
System measurement:
y1t λ1 ζ1t . . . . = . (1, 0, ..., 0) αt + . yNt λN ζNt
λ1 0 ... 0 ζ1t . . = . αt + . λN 0 ... 0 ζNt
138 / 357 Part III: The Kalman Filter
139 / 357 State Space Representation
αt = T αt−1 + R ηt mx1 mxm mx1 mxg gx1
yt = Z αt + ζt Nx1 Nxm mx1 Nx1
η t ∼ WN 0, diag( Q , H ) ζt |{z} |{z} g×g N×N
0 E(α0 ηt ) = 0mxg 0 E(α0 ζt ) = 0mxN
140 / 357 The Filtering “Thought Experiment”
– Prediction and updating
– “Online”, “ex ante”, using only real-time available datay ˜t to extract αt and predict αt+1, wherey ˜t = {y1, ..., yt }
141 / 357 Statement of the Kalman Filter
I. Initial state estimate and MSE
a0 = E(α0)
0 P0 = E(α0 − a0)(α0 − a0)
142 / 357 Statement of the Kalman Filter
II. Prediction Recursions
at/t−1 = T at−1
0 0 Pt/t−1 = TPt−1 T + RQR
III. Updating Recursions
0 −1 at = at/t−1 + Pt/t−1 Z Ft (yt − Zat/t−1) 0 (where Ft = ZPt/t−1 Z + H) 0 −1 Pt = Pt/t−1 − Pt/t−1 Z Ft ZPt/t−1
t = 1, ..., T
143 / 357 State-Space Representation in Density Form (Assuming Normality)
0 αt |αt−1 ∼ N(T αt−1, RQR )
yt |αt ∼ N(Zαt , H)
144 / 357 Kalman Filter in Density Form (Assuming Normality)
Initialize at a0, P0
State prediction: αt |y˜t−1 ∼ N(at/t−1, Pt/t−1) at/t−1 = Tat−1 0 0 Pt/t−1 = TPt−1T + RQR
Update: αt |y˜t ∼ N(at , Pt ) at = at/t−1 + Kt (yt − Zat/t−1) Pt = Pt/t−1 − Kt ZPt/t−1
wherey ˜t = {y1, ..., yt }
145 / 357 Useful Result 1: Conditional Expectation is MVUE Extraction Under Normality
Suppose that
x ∼ N µ, Σ y
where x is unobserved and y is observed.
Then ZZ 2 E(x|y) = argmin xˆ(y) (x − xˆ(y)) f (x, y) dx dy,
wherex ˆ(y) is any unbiased extraction of x based on y
146 / 357 Useful Result 2: Properties of the Multivariate Normal
x 0 Σxx Σxy ∼ N µ, Σ µ = (µx , µy ) Σ = y Σyx Σyy