Lecture Notes
Total Page:16
File Type:pdf, Size:1020Kb
Lecture Notes Lars Peter Hansen October 8, 2007 2 Contents 1 Approximation Results 5 1.1 One way to Build a Stochastic Process . 5 1.2 Stationary Stochastic Process . 6 1.3 Invariant Events and the Law of Large Numbers . 6 1.4 Another Way to Build a Stochastic Process . 8 1.4.1 Stationarity . 8 1.4.2 Limiting behavior . 9 1.4.3 Ergodicity . 10 1.5 Building Nonstationary Processes . 10 1.6 Martingale Approximation . 11 3 4 CONTENTS Chapter 1 Approximation Results 1.1 One way to Build a Stochastic Process • Consider a probability space (Ω, F, P r) where Ω is a set of sample points, F is an event collection (sigma algebra) and P r assigns proba- bilities to events . • Introduce a function S :Ω → Ω such that for any event Λ, S−1(Λ) = {ω ∈ Ω: S(ω) ∈ Λ} is an event. • Introduce a (Borel measurable) measurement function x :Ω → Rn. x is a random vector. • Construct a stochastic process {Xt : t = 1, 2, ...} via the formula: t Xt(ω) = X[S (ω)] or t Xt = X ◦ S . Example 1.1.1. Let Ω be a collection of infinite sequences of real numbers. Specifically, ω = (r0, r1, ...), S(ω) = (r1, r2, ...) and x(ω) = r0. Then Xt(ω) = rt. 5 6 CHAPTER 1. APPROXIMATION RESULTS 1.2 Stationary Stochastic Process Definition 1.2.1. The transformation S is measure-preserving if: P r(Λ) = P r{S−1(Λ)} for all Λ ∈ F. Proposition 1.2.2. When S is measure-preserving, the process {Xt : t = 1, 2, ...} has identical distributions for every t. That is, the distribution func- tion for Xt is the same for all t. • Given X, form a vector X ∗ . X1 X = ... X` Apply Proposition identical to X∗ conclude that the joint distribution function for (Xt,Xt+1, ..., Xt+`) is independent of t for t = 1, 2, .... The fact that this holds any choice of ` is a equivalent to a statement that the process {Xt} is stationary. (Some people use the term strict sta- tionarity for this property.) 1.3 Invariant Events and the Law of Large Numbers Definition 1.3.1. An event Λ is invariant if Λ = S−1(Λ). Let J denote the collection of invariant events. (Like F, this event collec- tion is also a sigma algebra). We are interested in E(X|J ). If the invariant events are unions of a finite partition Λj (along with the null set), then R XdP r E(X|J )(ω) = Λj P r(Λj) The conditional expectation is constant within a partition and varies across partitions. 1.3. INVARIANT EVENTS AND THE LAW OF LARGE NUMBERS 7 There is an alternative way to think of this conditional expectation. Let H be an n-dimensional measurement function such that t Ht(ω) = H[S (ω)] is time invariant (does not depend on calendar time). Let H denote the collection of all such random vectors or measurement functions and solve the following least squares problem: min E[|X − H|2] H∈H where we now assume that E|X|2 < ∞. The solution to the least squares problem is E(X|J ). This approach does not require a finite partition, but it adds a second moment restriction. In fact there are more general measure-theoretic ways to construct this expectation. Provided that E|X| < ∞, E(X|J ) is essentially unique random variable that for any invariant event Λ satisfies: E ([X − E(X|J )]1Λ) = 0 where 1Λ is the indicator function equal to one on the set Λ and zero other- wise. Theorem 1.3.2. (Birkhoff) Suppose that S is measure preserving. i) For any X such that E|X| < ∞ T 1 X X → E(X|J ) T t t=1 with probability one; ii) for any X such that E|X|2 < ∞, T 2 1 X E X − E(x|J ) → 0. T t t=1 Definition 1.3.3. The transformation S is ergodic if all of the invariant events have probability zero or one. Lemma 1.3.4. Suppose that S is ergodic. Then EX = E(X|J ). 8 CHAPTER 1. APPROXIMATION RESULTS 1.4 Another Way to Build a Stochastic Pro- cess We may start by specifying a collection of joint distributions. Instead of specifying: X1 ∗ . X2 X = , ` ... X` ˆ specify a joint probability distribution P r`. Check and make sure that the ˆ ˆ distribution P r`+1 is consistent with P r` because both give probabilities for the events: ∗ P r{X` ∈ B} for (Borel) sets B. Then there exists a space (Ω, F, P r) and a stochastic process {Xt : t = 1, 2, ...} as in our previous construction. Kolmogorov Extension Theorem. An important application is the construction of Markov processes. Con- sider a state space E, and a transition density T (x∗|x) relative to a measure dλ. The conditional probabilities of Xt+1 given Xt,Xt−1, ...X0 are given by: T (x∗|x)dλ(x∗) when Xt = x. There is an associated conditional expectation function. Let f : E → R. For f bounded define: Z ∗ ∗ ∗ Tf(x) = E [f(xt+1)|xt = x] = f(x )T (x |x)dλ(x ) Include a marginal q0 over E, then we have constructed all of the joint dis- tributions. Iterating on T forms expectations over longer time periods (Law of Iterated Expectations): j T f(x) = E [f(xt+j)|xt = x] 1.4.1 Stationarity Definition 1.4.1. A stationary density q for a Markov process satisfies Z T (x∗|x)q(x)dλ(x) = q(x∗) 1.4. ANOTHER WAY TO BUILD A STOCHASTIC PROCESS 9 for some nonnegative q for which R q(x)dλ(x) = 1. Example 1.4.2. Suppose that T (x∗|x)q(x) = T (x|x∗)q(x∗) for some nonnegative q for which R q(x)dλ(x) = 1. Note that Z Z T (x∗|x)q(x)dλ(x) = T (x|x∗)q(x∗)dλ(x) = q(x∗). Thus q is a stationary density. When the Markov process is initialized according to a stationary den- sity, we may build the process {Xt : t = 1, 2, ...} with a measure-preserving transformation S under our first construction. 1.4.2 Limiting behavior We are interested in situations when j T f(x) → r for some r. Let q be a stationary density. Then it is necessarily true that Z Z j T f(x)q(x)dλ(x) = f(x)q(x)dλ(x) for all j. Thus Z r = fdλ(x). This may seem peculiar because so far we have not assumed that the station- ary density is unique, but we did presume that the limit point is a number and not a random variable. Apparently, if there are multiple stationary distributions, we must able to find functions f for which this limit is not constant, and indeed this is case. In fact more can be said. We may in fact find functions for which Tf = f other than the unit function. 10 CHAPTER 1. APPROXIMATION RESULTS 1.4.3 Ergodicity Associated with this stationary density, form the space of function L2 given by: Z n 2 L2 = {f : E → R : f(x) q(x)dλ(x) < ∞} Then it may be shown that T : L2 → L2. Lemma 1.4.3. Suppose that Tf = f for some f ∈ L2. Then f(Xt) is constant over time with probability one. Proof. Z Z 2 2 E [f(Xt+1)f(Xt)] = (Tf)fqdλ = f qdλ = E f(Xt) . Thus 2 E [f(Xt+1) − f(Xt)] = 0 since q is a stationary density. When the only solution to the eigenvalue equation Tf = f is a constant function (with q measure one), then we may build the process {Xt : t = 1, 2, ...} using a transformation S that is measure preserving and ergodic.1 1.5 Building Nonstationary Processes For economic applications, it is too limiting to consider only time series mod- els that are stationary. Instead we are interested in processes that display stochastic counterparts to geometric growth or arithmetic growth in loga- rithms. Let {Xt} be a stationary Markov process. 1This notion of ergodicity is relative to a measure, in this case a stationary distribution for the Markov process. When there are multiple stationary distributions, while a constant solution to the eigenvalue problem may be the only one that works for one such distribution non constant solution can exist for other stationary distributions. 1.6. MARTINGALE APPROXIMATION 11 Definition 1.5.1. If a process {Yt : t = 0, 1, ...} can be represented as: Yt+1 − Yt = κ(Xt+1,Xt), or equivalently t+1 X Yt+1 = Y0 + κ(Xj,Xj−1), j=1 then it is said to be additive. [ [ A linear combinations of two additive processes {Yt+11]} and {Yt+12]} is an additive process. Example 1.5.2. Xt+1 = AXt + BWt+1 where {Wt+1 : t = 1, 2, ...} is an iid sequence of multivariate normally dis- tributed random vectors and B has full column rank. Premultiply by B0 and obtain: 0 0 0 B Xt+1 − B Xt = B BWt+1. Then 0 −1 0 0 Wt+1 = (B B) (B Xt+1 − B AXt) . Form κ(Xt+1,Xt) = µ(Xt) + σ(Xt)Wt+1. 2 Then µ(Xt) is the conditional mean of Yt+1−Yt and |σ(Xt)| is the conditional variance. {Yt : t = 0, 1, ...} is a martingale if µ(Xt) = 0. Since σ depends on the Markov state, this is referred to as a stochastic volatility model.