HETERGENEITYIN QUANTITATIVE MACROECONOMICS @TSEOCTOBER 17, 2016 BASICS SANG YOON (TIM)LEE

Very simple notes (need to add references). It is NOT meant to be a substitute for a real course in stochastic calculus, just listing heuristic derivations of the stuff most oftenly used in economics. Ito calculus is a lot more than only dealing with Poisson jumps and Wiener processes. Some abuses of notation included without clarification.

1.

A stochastic process is a collection of random variables (measurable functions)

{Xt : t ∈ T} , Xt : Ω → S, ordered in t (time), along with a measurable space (S, Σ). The space (Ω, F, P) denotes, respectively, the “state space" (set of all possible histories), the σ-algebra that contains all possible sets (Borel sets) of histories induced by Ω, and the probability mea- sure over F. The space (S, Σ) contains the range of the function Xt : Ω 7→ S and its corresponding σ-algebra. For example, for most of our applications Xt ∈ R or R+. If Xt is measurable, any process induces a Pt that we can construct using the original . This calls for the notion of a filtration: a weakly increasing collection of Borel sets on Σ, {Ft, t ∈ T}, s.t. for all s < t ∈ T,

Fs ⊂ Ft ⊂ F.

The process X is adapted to the filtration {Ft}t∈T if Xt is Ft-measurable. This just means that for any Xt, I can compute the probability only using Ft and not all of F. Hence, a well defined stochastic process is always adapted to its natural filtration

n −1 o Ft = σ Xs (A) : s ≤ t, A ∈ Σ .

This just means that for any history of Xt up to time t, all possibly realizable trajectories can be mapped backed into a subset of Ft, so that I can compute its probability for all points up to time t. This generates an induced over X.

EXAMPLE 1 Let Ω = [0, 1]∞. Then any ω ∈ Ω is just a coordinate on the infinite dimensional unit cube. If we let Xt : Ω 7→ S denote the t-th coordinate, S is just the unit interval [0, 1]. If we construct, say, the probability measure so that P = P1 × · · · × P∞, where each Pt is the uniform distribution, Xt is i.i.d. uniform.

2. Poisson (Jump) Process

Let Nt be the equal to the number of “hits" up to time t. The (adapted) state space is R[0,t] and range is all right-continuous paths that increase by 1. Now define

1 the probability measure over ω as Poisson:

(λ(t − s])n P {N − N = n} = · exp(−λ(t − s]) t s n! where λ is the rate of arrival. This is what is usually called the Poisson process. (Not to be confused with what we use more often in economics: Xt is a (CPP) if if it changes to some value at rate λt, studied below. In fact this is a new random variable in which Xt changes to some value if Nt > Ns for all s < t, and you could redefine the probability space to the histories of Nt rather than R[0,t]. This is the set of all right-continuous paths that increase by 1.) More typically, the Poisson process is defined as a counting process:

DEFINITION 1 A continuous stochastic process Nt is Poisson if

1.N t is a counting process:

Z+ (a)N t lives in (Z+, 2 ), for all t ≥ 0,

(b)N s ≤ Nt for all s ≤ t,

(c) lims↓t Ns ≤ lims↑t Ns for all t ≥ 0; that is, no hit can happen simultaneously.

2.N 0 = 0 a.s., 3. N is a stochastic process with stationary, independent increments

The two definitions are equivalent; there are many other definitions as well but I refer you to the internet. It is easier to show that the earlier definition implies the counting process; by definition, increments are independent. The probability of getting 0, 1, or 2 or more hits in a time interval dt > 0 is

P(Nt+dt − Nt = 0) = exp(−λdt) ≈ 1 − λdt + o(dt) 2 2 P(Nt+dt − Nt = 1) = λdt · exp(−λdt) ≈ λdt − λ dt + o(dt) ≈ λdt 2 −λdt P(Nt+dt − Nt ≥ 2) = (λdt) · e /2 + o(dt) = o(dt).

Clearly, the actual probability that something happens in any interval dt (and (t, t + dt], since the increments are independent) is 0. Conversely, one way to make sense of the counting process is to realize that stationarity implies

E[N(T)/T] = lim N(T)/T = λ T→∞ and instead of sending T to infinity, send the number of intervals dt in (0, T] to infinity to get that the expected number of hits in any given time interval is λ:

E[dNt] = E[N(dt)] ∞ = λdt = 0 · P(N(dt) = 0) + 1 · P(N(dt) = 1) + ∑ n · P(N(dt) = n) n=2

2 = P(N(dt) = 1) since two hits cannot occur at the same time. This is important later when we derive the stochastic HJB equation.

2.1 Compound Poisson Process

Now define a over the underlying Poisson process: Let Xt be a r.v. that is γa if Nt is even and γb if Nt is odd. Heuristically,

2 −λdt E[dXt] = 0 · exp(−λdt) + (γb − γa) · λdt · exp(−λdt) + 0 · (λdt) · exp /2

E[X˙ t] = lim (γb − γa) · λ · exp(−λdt) = λ(γb − γa) dt→0

More generally, let {Zk}k≥1 be an i.i.d. ordered sequence of random variables with mea- sure Gz(z), independent of the Poisson process Nt. Let Xt be a continuous stochastic process that is a function of (Nt, Zk), and define

Nt Xt = ∑ Zk. k=1 Then ( " ! #) ∞ n ∞ e−λt(λt)n EXt = ∑ E ∑ Zk|Nt = n P(Nt = n) = ∑ n=1 k=1 n=1 n! " !# n ∞ e−λt(λt)n−1 = E Z = λtµ = µ · λt ∑ k Z ∑ ( − ) Z k=1 n=1 n 1 ! and Z t dXt = ZNt dNt = ZNt dNt 0 assuming X0 = 0.

2.2 Stochastic Integral with Poisson

First consider a function f (Nt). The integral is easy to write as

Nt Z t f (Nt) − f (0) = ∑ [ f (k) − f (k − 1)] = [ f (1 + Ns− ) − f (Ns− )] dNs k=1 0 Z t Z t = [ f (Ns) − f (Ns − 1)] dNs = [ f (Ns) − f (Ns− )] dNs, 0 0

3 where Ns− is the left limit of the Poisson process, and only one jump occurs in an dt by definition (or construction) of the Poisson process. For the compound process, recall that the waiting time for the kth hit of the Poisson process, Tk, is also a random variable s.t. that the event {Tk > t} ⇔ {Nt ≥ k − 1}; in particular this means that Tk − Tk−1 is an i.i.d. process by definition. For k = 1, the waiting time follows an exponential distribution. For k > 1,

Z ∞ e−λs(λs)n−1 P(Tk > t) = λ ds, (1) t (n − 1)!

since

P(Tk > t) = P(Tk > t ≥ Tk−1) + P(Tk−1 > t) Z ∞ e−λs(λs)n−2 = P(Nt = n − 1) + λ ds t (n − 2)! e−λt(λt)n−1 Z ∞ e−λs(λs)n−2 = + λ ds (n − 1)! t (n − 2)!

and integration by parts leads to (1). Using waiting times, the stochastic integral of a function of a compound Poisson process can be written

Nt Z t − − − − f (Yt) − f (0) = [ f (Y + Zk) − f (Y )] = [ f (Ys + ZNs ) − f (Ys )] dNs ∑ Tk Tk k=1 0 Z t = [ f (Ys) − f (Ys− )] dNs 0 Z t Z t = [ f (Ys) − f (Ys− )] (dNs − λds) + λ [ f (Ys) − f (Ys− )] ds. 0 0

3. (Brownian Motion)

DEFINITION 2 A Wiener process is defined by four properties:

1.W 0 = 0 a.s.

2. Independent increments: Wt − Ws is independent of Fs for all s ≤ t

3. Normality: Wt − Ws ∼ N (0, t − s)

4.W t is continuous a.s. We could spend the whole semester just talking about this, which we won’t. Basically, think of Brownian motion as a in continuous time: the best predictor of dXt is 0, with Gaussian errors. So clearly, Wt is a particular type of a martingale (E[Wt|Fs] = Ws a.s., for all 0 ≤ s < t < ∞).

4 Most commonly you will encounter a Brownian motion with drift, a geometric Brow- nian motion, or a generic (Ito) :

dXt = µdt + σdWt, dXt = µXtdt + σXtdWt, dXt = µ(Xt)dt + σ(Xt)dWt

the geometric Brownian motion simply gives

dXt/Xt = d log Xt = µdt + σdWt,

so it is the just a Brownian motion with drift in percentage points (or log-points, to be exact). In the Ito process, the instantaneous drift and depend on the current value of Xt and is related to the version of Ito’s Lemma that we will look at below. Before we move along, note that both the Poisson process and Brownian motion are Markov processes, but while the Brownian motion has a continuous time path a.s., the Poisson process has a discontinuous time path a.s. Also, Poisson was not a martingale, but dNt − λdt was. It will be useful to know the of the Brownian motion: we will use a particular formulation that exploits the CLT in discrete time:

2n−1 2 2 hWi ≡ E[Wt ] = lim [∆iWt] t n→∞ ∑ i=1 ∆ Wt ≡ W n − W n i ti+1 ti

n n n where ti ≡ it/2 . This makes the difference in adjacent W equal t/2 , so √ n Zi ≡ 2 [∆iWt] ∼ N (0, t).

That is, all Zi are normal with variance t. Since

2n−1 2n−1 2 n ∑ ∆iWt = ∑ Zi /2 , i=1 t=1 the term converges to t a.s. by SLLN:

2 hWit ≡ E[Wt ] = t. Hence the quadratic variation of a Brownian motion is equal to t. This is an important notion that will help us understand Ito. Conversely, suppose Zi is a random walk s.t. ∆Z equals ±∆h with probability (p, 1 − p). That is, Z is Bernoulli. So E∆Z = ∆h(2p − 1) and V∆Z = 4p(1 − p)(∆h)2. Now we repeat this process n times; this is a and we can write

E[∆nZ] = n∆h(2p − 1) = T∆h(2p − 1)/∆t ≡ E[∆TZ] 2 2 V[∆nZ] = n(∆h) 4p(1 − p) = T(∆h) 4p(1 − p)/∆t ≡ V[∆TZ],

5 where all we have done is to consider that the n trials happened in a time interval T with time ∆t = T/n increments. If we want this process to converge to a Wiener process as ∆t → 0, n → ∞, we just choose ∆h and p so that

∆h(2p − 1)/∆t = µ, (∆h)24p(1 − p)/∆t = σ2,  √ hp i √ ∆h = σ ∆t 1 + (µ/σ) · ∆t = σ ∆t as ∆t → 0 ⇒ h √ .p i. p = 1 ± (µ/σ) · ∆t 1 + (µ/σ) · ∆t 2 √ For the standard BM Wt, µ = 0 and σ = 1, so ∆h = ∆t, p = 1/2. So BM can be viewed as the limit of the sum of Bernoulli i.i.d. r.v.’s with 1/2 ±1:

WT = lim [∆TZ] = lim [∆nZ] ∆t→0 n→∞ √ √ which converges to N (0, T) since ∆t = T/n.

4. Stochastic Integral with BM and CPP

The Ito stochastic integral is defined on a semi-martingale (basically, a random walk plus a process with finite variation), where the underlying martingale component has finite quadratic variation. That is,

Xt = X0 + Bt + Mt ⇒ dXt = dBt + dMt,

where E[Mt|Ft] = 0, Bt is adapted to Ft, and hMit < ∞. For example, if Mt is BM, hMit = t. Similarly if Mt is the compensated Poisson process Nt − λt, hMit = Nt.

THEOREM 1 (ITO’S LEMMAFORCONTINUOUSMARTINGALES) If Xt is continuous and f is a three-times continuously differentiable function, the stochastic integral of f (Xt) is

Z t Z t Z t 0 0 1 00 f (Xt) = f (X0) + f (Xs)dBs + f (Xs)dMs + f (Ms)d hMis 0 0 2 0 1 or d f (X ) = f 0(X )dB + f 0(X )dM + f 00(M )d hMi . t t t t t 2 t t Note that this version of Ito does not apply to Poisson. No proof given, but the intu- ition is that Ito extends Riemann integrals to stochastic increments:

n Z t   f (Xt)dXt = lim f (Xt ) Xt − Xt n→∞ ∑ i−1 i i−1 0 i=1

where Πn is an n-partition of [0, t]. It is important that the point of approximation for each interval is taken from the left. Also importantly, the stochastic integral itself is not a deterministic concept: it is the martingale such that its quadratic variation equals the expectation of the square of all realized paths integrated over hXit.

6 Formally, note that any function of Mt is simply a process Yt that is adapted to Ft, the filtration of the martingale. Let Mt be square-integrable in the sense that

2 hMit = EMt < ∞.

(k) ∞ Let Y be simple process, that is for an infinitely fine partition {ti}i=0 on [0, t], and count- (k) ∞ ably infinite sequence of random variables {ζi }i=0,

∞ (k) = (k)1( ) + (k) 1( − ] Yt ζ0 0 ∑ ζi−1 ti−1 ti . i=1

One definition of the stochastic integral of Yt over Mt is the (unique) square integrable martingale It(Y) s.t.

DEFINITION 3 (HEURISTIC DEFINITION OF STOCHASTIC INTEGRAL) For all sequences (k) of simple processes limk→∞[Y − Y] = 0 (in quadratic variation):

lim kY(k) − Yk2 = lim E[Y(k) − Y]2 k→∞ k→∞

I(Y) is the martingale s.t.

h i2 lim kI(Y(k)) − I(Y)k2 = lim E I(Y(k)) − I(Y) = 0, k→∞ k→∞ where for each Y(n),

∞ ( (k)) = (k)  −  It Y ∑ ζi−1 Mti Mti−1 . i=1 This is just a complicated way of saying the stochastic integral is a Riemann-Stieltjes integral where the measure of integration is stochastic (so need Lebegue). When and when it doesn’t work, and why it’s unique, we won’t worry about. Perhaps the most important property of the stochastic integral defined as such is that

Z t  2 2 E[It(Y)|Fs] = Is(Y) and E[It(Y) ] = E Ys d hMis 0 where the first part just means it is a martingale, and the second that the square can be taken inside the integral. The problem is when Mti − Mti−1 goes to ∞, but Ito tells us that as long as hMit is bounded we can define an integral.

[Graphical Representation of Riemann-Stieltjes and Ito]

Although the above doesn’t apply to Poisson, we already know how to write the in- tegral for CPP. Consider the general process (which is all we’re going to

7 deal with, really)

Xt − X0 = µ(t, Xt) + σ(t, Xt)Wt + Yt ⇒ dXt = µ(t, Xt)dt + σ(t, Xt)dWt + dYt , (2) | {z } | {z } |{z} dBt dMt jumps

where Wt and Yt are independent Wiener and CPP. Then since hWit = t, we have

Z t Z t Z t 0 0 1 2 00 f (Xt) − f (X0) = µ(s, Xs) f (Xs)ds + σ(s, Xs) f (Xs)dWs + σ (s, Xs) f (Xs)ds 0 0 2 0 Z t + [ f (Xs) − f (Xs− )]dNs 0 1 or d f (X ) =µ(s, X ) f 0(X )dt + σ(s, X ) f 0(X )dW + σ2(s, X ) f 00(M )dt t s t s t t 2 s t + [ f (Xt) − f (Xt− )]dNt.

Note of caution: you cannot just write λ[ f (Xt) − f (Xt− )]dt instead of [ f (Xt) − f (Xt− )dNt] there, since dNt − λdt is only 0 in expectation. Without proving (again!) we can also let λ vary with time and state, of which the non-homogeneous Poisson process satisfies

 Z t  E Nt − λ(s, Xs)ds = 0 = Et [dNt − λ(t, Xt)dt] . 0

Intuitively, we can always reset the underlying Poisson process following any hit until the next hit arrives, during which the process remains “homogeneous."

5. Stochastic HJB

Let xt denote realizations from the stochastic process Xt that follows the jump diffusion process (2). Now consider the stochastic control problem

 Z T  v(t, xt, at) = max Et U(s, Xs, as, cs)ds (cs) t

s.t. das = f (s, xs, as, cs)ds

where T can be finite or infinite, and I have suppressed the scrap value Z(T, aT, XT) which may or may not be there. Except for xt, all that has changed from the deterministic control problem is that we added an expectation operator over the objective. But let us generalize the jumps a bit. Let Yt, be associated with a non-homogeneous Poission process with time- and state-dependent arrival rates λ(t, xt), and also have jumps Zk that are drawn from a time- and state-dependent measure Gz(z; t, xt). For example, Xt can be a random wage or dividend process, or interest rate process (in which case things can be simplified, since it would multiply the state as; likewise if we wanted a stochastic discount rate in which case it would show up multiplicatively in U).

8 Without going into the details, we can use similar methods as in deterministic control (stochastic versions of Taylor expansion, verification theorem) to show that the following heuristic method works:

0 ≥ U(t, at, ct)dt + Et [dV(t, xt, at)] where by Ito we have

 V  dV = V + V f (t, x , a , c ) + xx σ2(t, x , a ) dt + crap t a t t t 2 t t

+ λ(t, xt) [EtV(t, xt + Zk, at) − V(t, xt, at)] dt

where we have set Xt− = xt a.s., since it is already realized, and have allowed µ, σ to also depend on a. The crap term is

Et[crap] = 0

= Et[µ(t, xt, at)VxdWt] + Et {[V(t, xt + Zk, a) − V(t, xt, a)] · [dNt − λ(t, xt)]}

the latter since Zk is independent of Nt. So we get the HJB equation V −V (t, x, a) =H∗(t, x, a, V (t, x, a)) + xx σ2(t, x, a) t a 2 Z  + λ(t, x) V(t, x + z, a)dGz(z; t, x) − V(t, x, a)

so when U = e−ρtu, we can multiply the whole system by eρt and define v(t, x, a) ≡ eρtV(t, x, a) to obtain v ρv(t, x, a) =v (t, x, a) + Hˆ ∗(t, x, a, v (t, x, a)) + xx σ2(t, x, a) t a 2 Z  + λ(t, x) v(t, x + z)dGz(z; t, x) − v(t, x, a) .

Note that the only time the expectation operator comes in is for the (possibly) stochastic r.v. Zk; everything else is adapted to Ft. That is, for the HJB, there is no longer any expectations taken over Xt; all of that is washed out in continuous time with martingales. Since the Hamiltonians are the same as in the deterministic case, it follows that the f.o.c. holds deterministically in continuous time, that is, uc(t, x, a, c) + va f (t, x, a, c) = 0 (no expectations over v or va!)

6. Fokker-Planck (Kolmogorov Forward) Equation

The last tool that will be relevant for our purposes is the KFE. Given a solution to v(t, x, a), we want to understand the evolution of p(t, x, a), the population p.d.f. over (x, a) at time t. KFE gives us a (partial) differential equation that does exactly this. Formally, KFE tells

9 you: suppose at time t, you know P{(x, a) ∈ B} for all B ∈ Ft. How is P(B) evolving going forward in the filtration (for the same set B)? For example, in the savings problem a solution to v admits optimal policy functions c∗(t, x, a) and associated (change in) assets a˙∗(t, x, a). Now suppose at time t, the p.d.f. is represented as p(t, x, a). KFE tells us how the distribution evolves going forward. (Con- versely, Feynman-Kac, or the Kolmogorov Backward Equation, tells us how you would have got to p(t, x, a) going backward; but we are not so interested in this). To compare with discrete time, it is as if we are simulating a distribution of individuals starting from some given initial distribution. The following is a version of Fokker-Planck:

THEOREM 2 (FOKKER-PLANCK-KOLMOGOROV) Let Xt be a stochastic process as in (2), where Yt is a CPP with jumps Zk ∼ Gz(t, Xt) associated with a non-homogeneous Poisson process with rate λ(t, Xt). Let p(t, x) denote the p.d.f. of x at time t. Then for all x ∈ (xmin, xmax) (the interior of possibly realizable states),

∂ ∂ ∂2 h i p(t, x) = − [µ(t, x)p(t, x)] + σ2(t, x)p(t, x) ∂t ∂x 2∂x2 Z 0 0 0 0 0 − λ(t, x)p(t, x) + gz(x − x ; t, x )λ(t, x )p(t, x )dx .

Proof (heuristic): The trick is to use a function that is differentiable, so we can apply Ito. For any x ∈ S (the range of Xt), approximate the probability of the event by the expectation of a smooth function:

Z x Z 0 0 0 0 P(Xt ≤ x) = p(t, x )dx = E[χ(Xt ≤ x)] = χ(x ≤ x)dx .

With some abuse of notation, we will assume that the indicator is already smoothed (it does not matter how it is smoothed; convolving it with any mollifer will do). Using Ito, we obtain (the derivative is w.r.t. time): 1 dP(X ≤ x) =E[χ0(X ≤ x)µ(t, X ) + σ2(t, X )]dt + crap (3) t t t 2 t + E {λ(t, Xt)[χ(Xt− + Zk ≤ x) − χ(Xt− ≤ x)]} where crap is again 0 in expectations. Note that the derivatives of χ are w.r.t Xt, not x. First look at the Poisson part. To compute the expectation over Xt, we denote the variable 0 0 of integration by Xt = x , and remember that Xt = Xt− = x a.s.: Z Z  ⇒ λ(t, x0) χ(x0 + z ≤ x)dG(z; t, x0) − χ(x0 ≤ x) p(t, x0)dx0 z Z Z x 0 0 0 0 0 0 0 0 = λ(t, x )Gz(x − x ; t, x )p(t, x )dx − λ(t, x )p(t, x )dx .

Note that this is “as if" we were looking “backward," not “forward” like we did in the HJB. This is because we are looking at all the Xt when they are still random variables, not

10 0 a realized point like in the HJB. (The density functions are deterministic in x, not Xt = x ). For the diffusion part, we can integrate by parts:

Z Z x ∂ χ0(x0 ≤ x)µ(t, x0)p(t, x0)dx0 = B (x) − [µ(t, x0)p(t, x0)]dx0, 1 ∂x0 where B1 is a term determined by boundary conditions:

B1(x) ≡ χ(xmax ≤ x)µ(t, xmax)p(t, xmax) − µ(t, xmin)p(t, xmin).

And likewise Z Z  ∂  χ00(x0 ≤ x)σ2(t, x0)p(t, x0)dx0 =B (x) − χ0(x0 ≤ x) [σ2(t, x0)p(t, x0)] dx0 2 ∂x0 Z x  ∂2  =B (x) − B (x) + [σ2(t, x0)p(t, x0)] dx0 2 3 ∂x02 where Bj are determined by boundary conditions:

0 2 0 2 B2(x) ≡χ (xmax ≤ x)σ (t, xmax)p(t, xmax) − χ (xmin ≤ x)σ (t, xmin)p(t, xmin) ∂ h i ∂ h i B (x) ≡χ(x ≤ x) σ2(t, x )p(t, x ) − σ2(t, x )p(t, x ) . 3 max ∂x max max ∂x min min So we have obtained that (3) becomes

Z x  ∂ ∂2  dP(t, x) = − µ(t, x0) + σ2(t, x0) − λ(t, x0) p(t, x0)dx0 (4) ∂x0 2∂x02 Z 0 0 0 0 0 + λ(t, x )Gz(x − x ; t, x )p(t, x )dx + B1(x) + B2(x) − B3(x)

0 Note that Bj(x) = 0 except at the boundaries. So taking the derivative w.r.t. x on both sides of (4) we obtain the formula in the theorem.  Of course in our economics problems, typically we want p(t, x, a) not p(t, x). But most problems will assume a law of motion s.t. x is subsumed in a, as we will see later.

11