Lecture Notes

Lars Peter Hansen

October 8, 2007 2 Contents

1 Approximation Results 5 1.1 One way to Build a ...... 5 1.2 Stationary Stochastic Process ...... 6 1.3 Invariant Events and the ...... 6 1.4 Another Way to Build a Stochastic Process ...... 8 1.4.1 Stationarity ...... 8 1.4.2 Limiting behavior ...... 9 1.4.3 ...... 10 1.5 Building Nonstationary Processes ...... 10 1.6 Martingale Approximation ...... 11

3 4 CONTENTS Chapter 1

Approximation Results

1.1 One way to Build a Stochastic Process

• Consider a probability space (Ω, F, P r) where Ω is a set of sample points, F is an event collection (sigma algebra) and P r assigns proba- bilities to events .

• Introduce a function S :Ω → Ω such that for any event Λ,

S−1(Λ) = {ω ∈ Ω: S(ω) ∈ Λ}

is an event.

• Introduce a (Borel measurable) measurement function x :Ω → Rn. x is a random vector.

• Construct a stochastic process {Xt : t = 1, 2, ...} via the formula:

t Xt(ω) = X[S (ω)]

or t Xt = X ◦ S .

Example 1.1.1. Let Ω be a collection of infinite sequences of real numbers. Specifically, ω = (r0, r1, ...), S(ω) = (r1, r2, ...) and x(ω) = r0. Then Xt(ω) = rt.

5 6 CHAPTER 1. APPROXIMATION RESULTS 1.2 Stationary Stochastic Process

Definition 1.2.1. The transformation S is measure-preserving if:

P r(Λ) = P r{S−1(Λ)} for all Λ ∈ F.

Proposition 1.2.2. When S is measure-preserving, the process {Xt : t = 1, 2, ...} has identical distributions for every t. That is, the distribution func- tion for Xt is the same for all t.

• Given X, form a vector  X  ∗ . X1 X =    ...  X` Apply Proposition identical to X∗ conclude that the joint distribution function for (Xt,Xt+1, ..., Xt+`) is independent of t for t = 1, 2, .... The fact that this holds any choice of ` is a equivalent to a statement that the process {Xt} is stationary. (Some people use the term strict sta- tionarity for this property.)

1.3 Invariant Events and the Law of Large Numbers

Definition 1.3.1. An event Λ is invariant if Λ = S−1(Λ).

Let J denote the collection of invariant events. (Like F, this event collec- tion is also a sigma algebra). We are interested in E(X|J ). If the invariant events are unions of a finite partition Λj (along with the null set), then

R XdP r E(X|J )(ω) = Λj P r(Λj)

The conditional expectation is constant within a partition and varies across partitions. 1.3. INVARIANT EVENTS AND THE LAW OF LARGE NUMBERS 7

There is an alternative way to think of this conditional expectation. Let H be an n-dimensional measurement function such that

t Ht(ω) = H[S (ω)] is time invariant (does not depend on calendar time). Let H denote the collection of all such random vectors or measurement functions and solve the following least squares problem: min E[|X − H|2] H∈H where we now assume that E|X|2 < ∞. The solution to the least squares problem is E(X|J ). This approach does not require a finite partition, but it adds a second moment restriction. In fact there are more general measure-theoretic ways to construct this expectation. Provided that E|X| < ∞, E(X|J ) is essentially unique random variable that for any invariant event Λ satisfies:

E ([X − E(X|J )]1Λ) = 0 where 1Λ is the indicator function equal to one on the set Λ and zero other- wise. Theorem 1.3.2. (Birkhoff) Suppose that S is measure preserving. i) For any X such that E|X| < ∞

T 1 X X → E(X|J ) T t t=1 with probability one; ii) for any X such that E|X|2 < ∞,

 T 2 1 X E X − E(x|J ) → 0.  T t  t=1

Definition 1.3.3. The transformation S is ergodic if all of the invariant events have probability zero or one. Lemma 1.3.4. Suppose that S is ergodic. Then EX = E(X|J ). 8 CHAPTER 1. APPROXIMATION RESULTS 1.4 Another Way to Build a Stochastic Pro- cess

We may start by specifying a collection of joint distributions. Instead of specifying:   X1 ∗ . X2 X =   , `  ...  X` ˆ specify a joint probability distribution P r`. Check and make sure that the ˆ ˆ distribution P r`+1 is consistent with P r` because both give probabilities for the events: ∗ P r{X` ∈ B} for (Borel) sets B. Then there exists a space (Ω, F, P r) and a stochastic process {Xt : t = 1, 2, ...} as in our previous construction. Kolmogorov Extension Theorem. An important application is the construction of Markov processes. Con- sider a state space E, and a transition density T (x∗|x) relative to a measure dλ. The conditional probabilities of Xt+1 given Xt,Xt−1, ...X0 are given by: T (x∗|x)dλ(x∗) when Xt = x. There is an associated conditional expectation function. Let f : E → R. For f bounded define: Z ∗ ∗ ∗ Tf(x) = E [f(xt+1)|xt = x] = f(x )T (x |x)dλ(x )

Include a marginal q0 over E, then we have constructed all of the joint dis- tributions. Iterating on T forms expectations over longer time periods (Law of Iterated Expectations):

j T f(x) = E [f(xt+j)|xt = x]

1.4.1 Stationarity Definition 1.4.1. A stationary density q for a Markov process satisfies Z T (x∗|x)q(x)dλ(x) = q(x∗) 1.4. ANOTHER WAY TO BUILD A STOCHASTIC PROCESS 9 for some nonnegative q for which R q(x)dλ(x) = 1.

Example 1.4.2. Suppose that

T (x∗|x)q(x) = T (x|x∗)q(x∗) for some nonnegative q for which R q(x)dλ(x) = 1. Note that

Z Z T (x∗|x)q(x)dλ(x) = T (x|x∗)q(x∗)dλ(x) = q(x∗).

Thus q is a stationary density.

When the Markov process is initialized according to a stationary den- sity, we may build the process {Xt : t = 1, 2, ...} with a measure-preserving transformation S under our first construction.

1.4.2 Limiting behavior We are interested in situations when

j T f(x) → r for some r. Let q be a stationary density. Then it is necessarily true that Z Z j T f(x)q(x)dλ(x) = f(x)q(x)dλ(x) for all j. Thus Z r = fdλ(x).

This may seem peculiar because so far we have not assumed that the station- ary density is unique, but we did presume that the limit point is a number and not a random variable. Apparently, if there are multiple stationary distributions, we must able to find functions f for which this limit is not constant, and indeed this is case. In fact more can be said. We may in fact find functions for which Tf = f other than the unit function. 10 CHAPTER 1. APPROXIMATION RESULTS

1.4.3 Ergodicity

Associated with this stationary density, form the space of function L2 given by: Z n 2 L2 = {f : E → R : f(x) q(x)dλ(x) < ∞}

Then it may be shown that T : L2 → L2.

Lemma 1.4.3. Suppose that Tf = f for some f ∈ L2. Then f(Xt) is constant over time with probability one.

Proof. Z Z 2  2 E [f(Xt+1)f(Xt)] = (Tf)fqdλ = f qdλ = E f(Xt) .

Thus 2 E [f(Xt+1) − f(Xt)] = 0 since q is a stationary density.

When the only solution to the eigenvalue equation

Tf = f is a constant function (with q measure one), then we may build the process {Xt : t = 1, 2, ...} using a transformation S that is measure preserving and ergodic.1

1.5 Building Nonstationary Processes

For economic applications, it is too limiting to consider only mod- els that are stationary. Instead we are interested in processes that display stochastic counterparts to geometric growth or arithmetic growth in loga- rithms. Let {Xt} be a stationary Markov process.

1This notion of ergodicity is relative to a measure, in this case a stationary distribution for the Markov process. When there are multiple stationary distributions, while a constant solution to the eigenvalue problem may be the only one that works for one such distribution non constant solution can exist for other stationary distributions. 1.6. MARTINGALE APPROXIMATION 11

Definition 1.5.1. If a process {Yt : t = 0, 1, ...} can be represented as:

Yt+1 − Yt = κ(Xt+1,Xt), or equivalently t+1 X Yt+1 = Y0 + κ(Xj,Xj−1), j=1 then it is said to be additive. [ [ A linear combinations of two additive processes {Yt+11]} and {Yt+12]} is an additive process. Example 1.5.2. Xt+1 = AXt + BWt+1 where {Wt+1 : t = 1, 2, ...} is an iid sequence of multivariate normally dis- tributed random vectors and B has full column rank. Premultiply by B0 and obtain: 0 0 0 B Xt+1 − B Xt = B BWt+1. Then 0 −1 0 0 Wt+1 = (B B) (B Xt+1 − B AXt) . Form κ(Xt+1,Xt) = µ(Xt) + σ(Xt)Wt+1. 2 Then µ(Xt) is the conditional mean of Yt+1−Yt and |σ(Xt)| is the conditional variance. {Yt : t = 0, 1, ...} is a martingale if µ(Xt) = 0. Since σ depends on the Markov state, this is referred to as a stochastic volatility model.

In what follows suppose let Ft be the information set (sigma algebra) generated by X0,X1, ..., Xt.

1.6 Martingale Approximation

In what follows, we use the following subspace of L2: Z Z = {f ∈ L2 : fqdλ = 0}.

Thus functions in Z have mean zero under the stationary distribution. Define R 2 1/2 the norm kfk = f qdλ on L2 and hence on Z. 12 CHAPTER 1. APPROXIMATION RESULTS

Definition 1.6.1. The conditional expectation operator T is a strong con- traction (on Z) if there exists a 0 < ρ < 1 such that

kTfk ≤ ρkfk for some ρ < 1 and all f ∈ Z.2 Example 1.6.2. Suppose that f ∈ Z and

Yt+1 − Yt = f(Xt). Thus κ(x∗, x) = f(x). Compute

∞ ∞ X X j −1 E [f(Xt+j)|Xt = x] = T f(x) = (I − T) f(x) = g(x) j=0 j=0 provided that the infinite sum is finite. A sufficient condition for this property is that T be a strong contraction, but the condition holds more generally. Let

∗ ∗ ∗ ∗ κ (x , x) = g(x ) − g(x) + f(x) = g(x ) − Tg(x). Thus ∗ E [κ (Xt+1,Xt)|Xt] = 0, and

t X Yt+1 = f(Xj) j=1 t+1 X ∗ ∗ ∗ = κ (Xj,Xj−1) − g (Xt+1) + g (X0) j=1 is a martingale. Next we consider a way to combine results to produce a more general martingale decomposition.

i) Compute conditional expectation of the growth rate: E [κ(Xt+1,Xt)|Xt = x] = f¯(x) and form ¯ κ¯(Xt+1,Xt) = κ(Xt+1,Xt) − f(Xt).

2When this property is satisfied the underlying process is said to be ρ-mixing. 1.6. MARTINGALE APPROXIMATION 13

ii) Remove mean: f(x) = f¯(x) − R fqdλ¯ .

iii) Decompose:

Z " t+1 # " t+1 # ¯ X ∗ X Yt+1 = (t+1) fqdλ+ κ (Xj,Xj−1) + κ¯(Xj,Xj−1) −g(Xt+1)+g(X0). j=1 j=1

Thus we have shown

Proposition 1.6.3. Suppose that {Yt : t = 0, 1, ...} is an additive process, T 2 is a strong contraction on Z and E[κ(Xt+1,Xt) ] < ∞. Then

t+1 X Yt+1 = (t + 1)ν + κˆ(Xj,Xj−1) − g(Xt+1) + g(X0) j=1 where and E [ˆκ(Xt+1,Xt)|Xt] = 0.

Observations:

[1] • Linear combinations of two additive processes {Yt+1} and a process [2] {Yt+1} are co-integrated if there exists a linear combination for which the resulting time trend and two martingales are zero. See Engle and Granger (1987) for a discussion of cointegration.

• The components in the decomposition are each additive processes.

• Suppose that µ(x) = Hx + ν and σ(x) = G. Suppose that A has stable ∗ eigenvalues. Thenκ ¯(Xt+1,Xt) = GWt+1 and κ (Xt+1,Xt) = [H + −1 (I − A) B]Wt+1. Blanchard and Quah (1989), Fisher(JPE) identify technology shocks via:

−1 [G + H(I − A) B]Wt+1

Only supply shocks or technology shocks have long run consequences for output. Similarly, for Beveridge and Nelson (1981) [G + H(I − −1 A) B]Wt+1 is permanent shock in a permanent-transitory decompo- sition. 14 CHAPTER 1. APPROXIMATION RESULTS

Billingsley (1961) shows that an additive martingale process {Yt : t = 0, 1, ...} with stationary ergodic increments obeys a :

1 2 √ Yt =⇒ N(0,E[(Yt+1 − Yt) ]). t This central limit theorem is in a sense business as usual except that the terms in the sum (the increments in the additive process) are martingale differences rather than iid. The martingale difference process guarantees that

E [Yt+1 − Yt|Ft] = 0.

Gordin (1969) extends this result to allow for correlated increments. Gordin’s approach can be seen as an application of Proposition 1.6.3. Under the assumptions of this proposition:

1 2 √ Yt =⇒ N(0, σ ). t This may also look like business as usual but it is not. The variance used in the central limit approximation is a long-run notion of a variance or equiv- alently it is the variance of the martingale difference from the martingale approximation. That is,

2 1 2 σ = lim variance(Yt) = E [ˆκ(Xj,Xj−1)] . t→∞ t Temporal dependence in the additive process matters!

Corollary 1.6.4. (Gordin) Under the assumptions of Proposition 1.6.3,

1 2 √ Yt =⇒ N(0, σ ) t

2 2 where σ = E [ˆκ(Xj,Xj−1)] . The martingale approximation has two important uses: a) it identifies shocks with long run consequences; b) it gives us a central limit approxima- tion. It is the former use that will interest us. Bibliography

Beveridge, S. and C. R. Nelson. 1981. A New Approach to the Decomposition of Economic Time Series into Permanent and Transitory Components with Particular Attention to the Measurement of the ‘Business Cycle’. Journal of Monetary Economics 7:151–174.

Billingsley, P. 1961. The Lindeberg-Levy Theorem for Martingales. American Mathematical Monthly 12.

Blanchard, O. J and D. Quah. 1989. The Dynamic Effects of Aggregate Demand and Supply Disturbances. American Economic Review 79:655– 673.

Engle, R. and C. W. J. Granger. 1987. Co-integration and Error Correction: Representation, Estimation and Testing. Econometrica 55:251–276.

Gordin, M. I. 1969. The Central Limit Theorem for Stationary Processes. Soviet Mathematics Dokladay 10:1174–1176.

15