Markov Processes: Theory and Examples

Home , Dynkin system, Feller process, Polish space

MARKOV PROCESSES: THEORY AND EXAMPLES

JAN SWART AND ANITA WINTER

Date: April 10, 2013. 1 2 JAN SWART AND ANITA WINTER

Contents 1. Stochastic processes 3 1.1. Random variables 3 1.2. Stochastic processes 5 1.3. Cadlag sample paths 6 1.4. Compactiﬁcation of Polish spaces 18 2. Markov processes 23 2.1. The Markov property 23 2.2. Transition probabilities 27 2.3. Transition functions and Markov semigroups 30 2.4. Forward and backward equations 32 3. Feller semigroups 34 3.1. Weak convergence 34 3.2. Continuous kernels and Feller semigroups 35 3.3. Banach space calculus 37 3.4. Semigroups and generators 40 3.5. Dissipativity and the maximum principle 42 3.6. Hille-Yosida: diﬀerent formulations 46 3.7. Dissipative operators 48 3.8. Resolvents 50 3.9. Hille-Yosida: proofs 51 4. Feller processes 56 4.1. Markov processes 56 4.2. Jump processes 57 4.3. Fellerprocesseswithcompactstatespace 62 4.4. Feller processes with locally compact state space 65 5. Harmonic functions and martingales 70 5.1. Harmonic functions 70 5.2. Filtrations 70 5.3. Martingales 72 5.4. Stopping times 74 5.5. Applications 76 5.6. Non-explosion 79 6. Convergence of Markov processes 81 6.1. Convergence in path space 81 6.2. Proof of the main result (Theorem 4.2) 87 7. Strong Markov property 89 References 93 MARKOV PROCESSES 3

1. Stochastic processes In this section we recall some basic definitions and facts on topologies and stochastic processes (Subsections 1.1 and 1.2). Subsection 1.3 is devoted to the study of the space of paths which are continuous from the right and have limits from the left. Finally, for sake of completeness, we collect facts on compactifications in Subsection 1.4. These will only find applications in later sections. 1.1. Random variables. Probability theory is the theory of random variables, i.e., quantities whose value is determined by chance. Mathemati- cally speaking, a random variable is a measurable map X :Ω E, where (Ω, , P) is a probability space and (E, ) is a measurable space.→ The prob- abilityF measure E 1 P = (X) := P X− X L ◦ on (E, ) is called the law of X and usually the only object that we are really E 1 interested in. If (Xt)t T is a family of random variables, taking values in ∈ measurable spaces (Et, t)t T , then we can view (Xt)t T as a single random E ∈ ∈ variable, taking values in the product space t T Et equipped with the 2 ∈ product-σ-field t T t. The law P(Xt)t∈T = ((Xt)t T ) of this random ∈ E QL ∈ variable is called the joint law of the random variables (Xt)t T . In practise, weQ usually need a bit more structure on the space∈ s that our random variables take values in. For our purposes, it will be sufficient to consider random variables taking values in Polish spaces. Recall that a topology on a space E is a collection of subsets of E, called open sets, such that: O (1) E, . ∅ ∈ O (2) Ot t T implies t T Ot . ∈O∀ ∈ ∈ ∈ O (3) O1, O2 implies O1 O2 . ∈ O ∩S ∈ O A topology is metrizable if there exists a metric d on E such that the open sets in this topology are the sets O with the property that x O ε > ∀ ∈ ∃ 0 s.t. Bε(x) O, where Bε(x) := y E : d(x,y) < ε is the open ball around x with⊂ radius ε. Two metrics{ are∈ called equivalent}if they define the same topology. Concepts such as convergence, continuity, and compactness depend only on the topology but completeness depends on the choice of the

1At this point, one may wonder why probabilists speak of a random variables at all and do not immediately focus on the probability measures that are their laws, if that is what they are really after. The reason is mainly a matter of convenient notation. If µ = L(X) is the law of a real-valued random variable X, then what is the law of X2? In terms of random variables, this is simply L(X2). In terms of probability measures, this is the image of the probability measure µ under the map x 7→ x2, i.e., the measure µ ◦ f −1 where f : R → R is defined as f(x)= x2 –an unpleasantly long mouthful. 2 Recall that Qt∈T Et := {(xt)t∈T : xt ∈ Et ∀t ∈ T }. The coordinate projections πt : Qt∈T Et := {(xt)t∈T : xt ∈ Et} → Et are defined by πt((xs)s∈T ) := xt, t ∈ T . By definition, the product-σ-field Qt∈T Et is the σ-field on Qt∈T Et that is generated by the −1 coordinate projections, i.e., Qt∈T Et := σ(πt : t ∈ T )= σ({πt (A) : A ∈Et}). 4 JAN SWART AND ANITA WINTER metric.3 A topological space E is called separable if there exists a countable set D E such that D is dense in E. By⊂ definition, a topological space (E, ) is Polish if E is separable and there exists a complete metric defining theO topology on E. We always equip Polish spaces with the Borel-σ-field (E), which is the σ-field generated by the open sets. B The reason why we are interested in Polish spaces is that for random variables taking values in Polish spaces, certain useful results are true that do not hold in general, since they make use of the fact. Lemma 1.1 (Probability measures on Polish spaces are tight). Each probability measure P on a Polish space (E, ) is tight, i.e., for all ε> 0 there is a compact set K E such that P(K)O 1 ε. ⊆ ≥ −

Proof. Let (xk)k N be dense in (E, ), and let P be a probability measure ∈ O on (E, ). Given ε> 0 and a metric d on (E, ), we can choose N1,N2,... such thatO O

Nn 1 ε (1.1) P x′ : d(x′,x ) < 1 . ∪k=1 { k n} ≥ − 2n Nn 1 Let K be the closure of n 1 k=1 x′ : d(x′,xk) < n . Then K is totally 4 ≥ { } n bounded , and hence compact, and we have P(K) 1 ε ∞ 2 = 1 ε. T S ≥ − n=1 − − P For example, the following result states that provided the state space (E, ) is Polish, for each projective family of probability measures there existsO a projective limit. Theorem 1.2 (Percy J. Daniell [Dan19], Andrei N. Kolmogorov [Kol33]). Let (Et)t T be (a possibly uncountable) collection of Polish spaces and let ∈ µS (S T finite) be probability measures on (Et)t S such that ⊂ ∈ 1 (1.2) µ ′ (π )− = µ , S S′ T, S,S′ finite, S ◦ S S ⊂ ⊂ where πS denotes the projection on (Et)t S . Then there exists a unique ∈ probability measure µT on t T Et, equipped with the product σ-field, such that ∈ Q 1 (1.3) µ π− = µ , S T, S finite. T ◦ S S ⊂

Proof. For E R see e.g. Theorem 2.2.2 in [KS88]. t ≡

3More precisely: completeness depends on the uniform structure deﬁned by the metric. For the theory of uniform spaces, see for example [Kel55]. 4Recall that a set A is totally bounded if for each ε > 0, A possesses a ﬁnite ε-net, where an ε-net for A is a collection of points {xn} with the property that for each x ∈ A there is an xk such that d(x,xk) <ε. MARKOV PROCESSES 5

A consequence of Kolmogorov’s extension theorem is that if µS : S T finite are probability measures satisfying the consistency relat{ion (1.2),⊂ } then there exist random variables (Xt)t T defined on some probability space ∈ (Ω, , P) such that ((Xt)t S )= µS for each finite S T . (The canonical F L ∈ ⊂ choice is Ω = t T Et.) ∈ Exercise 1.3.QFor n N, ki 0, 1 , i = 0,...,n, and 0 =: t0 < t1 <

tτ −1 tτ (1.4) e− n e− n , if τn n, := 1 0 k1 kn 1 −tn ≤ { ≤ ≤···≤ ≤ } e− , if τn =1+ n. t1,...,tn (i) Show that the collection µ ; 0 =: t0 < t1 < < tn of prob- { n · · · } ability measures on k 0, 1 : 0 k1 kn 1 satisﬁes the consistence condition{ ∈ (1.2).{ } ≤ ≤···≤ ≤ } (ii) Can you ﬁnd one (or even more than one) 0, 1 -valued stochastic process X with { } (1.5) P X = k ,...,X = k = µt1,...,tn (k ,...,k ), 0 k k 1. { t1 1 tn n} 1 n ≤ 1 ≤···≤ n ≤

1.2. Stochastic processes. A stochastic process with index set T and state space E is a collection of random variables X = (Xt)t T (deﬁned on a probability space (Ω, , P)) with values in E. We will usually∈ be interested in the case that T =F [0, ) and E is a Polish space. We interpret X = ∞ (Xt)t [0, ) as a quantity the value of which is determined by chance and that develops∈ ∞ in time. A stochastic process is called measurable if the map (t,ω) X (ω) from 7→ t [0, ) Ω into E is measurable. The functions t Xt(ω) (with ω Ω) are called∞ × the sample paths of the process X. 7→ ∈ Lemma 1.4 (Right continuous sample paths). If X has right continuous sample paths then X is measurable.

(n) (n) Proof. Deﬁne processes X by Xt := X nt+1 /n. Then, for each measur- ⌊ ⌋ (n) 1 able set A E, (t,ω) : X (ω) A = ∞ [k/n, (k + 1)/n) X− (A), ⊆ { t ∈ } k=0 × k/n (n) so X is measurable for each n 1. By theS right-continuity of the sample paths, X(n) X pointwise, so≥X is measurable. n −→→∞

By definition, the laws (Xt1 ,...,Xtn ) with 0 t1 < < tn are called the finite-dimensional distributionsL of X. If X and≤Y are stochastic· · · processes with the same finite dimensional distributions then we say that Y is a version of X (and vice versa). Here X and Y need not be defined on the same probability space. If X and Y are stochastic processes defined on the same probability space then we say that Y is a modification of X if Xt = Yt a.s. 6 JAN SWART AND ANITA WINTER

t 0. Note that if Y is a modification of X, then X and Y have the same finite∀ ≥ dimensional distributions. We say that X and Y are indistinguishable if X = Y t 0 a.s.5 t t ∀ ≥ Example (Modification). Let (Ω, , P) = ([0, 1], [0, 1],ℓ) where ℓ is the Lebesgue measure. For a given xF (0, ), defineB [0, 1]-valued stochastic processes Xx and Y by ∈ ∞ t, if t = x, (1.6) Xx(t) := 0, if t 6= x, and (1.7) Y (t) := t, t [0, 1]. ∈ Then Y is a modification of Xx but X and Y are not indistinguishable.

Lemma 1.5 (Right continuous modiﬁcations). If Y is a modiﬁcation of X and X and Y have right-continuous sample paths, then X and Y are indistinguishable.

Proof. If Y is a modification of X then Xt = Yt t Q a.s. By right- continuity of sample paths, this implies that X = Y∀ t∈ 0 a.s. t t ∀ ≥ We will usually be interested in stochastic processes with sample paths that have right limits Xt+ := lims t Xs for each t 0 and left limits ↓ ≥ Xt := lims t Xs for each t > 0. In practise nobody can measure time with− infinite↑ precision, so when we model a real process it is a matter of taste whether we assume that the sample paths are right or left continuous; it is tradition to assume that they are right continuous. (Lemmas 1.4 and 1.5 hold equally well for processes with left continuous sample paths.) Note that a consequence of this assumption is that the sample paths cannot have a jump at time t = 0; this will actually be convenient later on. In the next section we study the space of all paths that are right-continuous with left limits in more detail. 1.3. Cadlag sample paths. Let (E, ) be a metrizable space. A function w : [0, ) E such that w is right continuousO and w(t ) exists for each t> 0 is∞ called→ a cadlag function (from the French “continu− à droit limite à gauche”). The space of all such functions is denoted by

E[0, ) := (1.8) D ∞ w : [0, ) E : w(t)= w(t+) t 0, w(t ) exists t> 0 . ∞ → ∀ ≥ − ∀ 5Note the order of the statements: If Y is a modiﬁcation of X, then there is for each ∗ P ∗ ∗ t ≥ 0 a measurable set Ωt ⊂ Ω with (Ω ) = 1 such that Xt(ω)= Yt(ω) for all ω ∈ Ωt . If X and Y are indistiguishable, then there exists a measurable set Ω∗ (independent of t) ∗ such that Xt(ω)= Yt(ω) for all ω ∈ Ω . MARKOV PROCESSES 7

We begin by observing that functions in E[0, ) are better behaved than one might suspect. D ∞

Lemma 1.6 (Only countably many jumps). If w E[0, ), then w has at most countably many points of discontinuity. ∈ D ∞

Proof. For n = 1, 2,..., and d a metric on (E, ), let O 1 (1.9) A := t> 0 : d w(t), w(t ) > . n − n Since w has limits from the right and the left, An can not possess cluster points. Hence An is countable for all n = 1, 2,..., and the set of all discontinuities of n 1An is countable too. ∪ ≥ In order to be in a position to do probability on spaces of random variables with values in [0, ) we want to equip [0, ) with a topology so that DE ∞ DE ∞ in this topology E[0, ) is Polish. We wil see that this is possible provided that (E, ) is Polish.D ∞ To motivateO the topology that we will choose, we ﬁrst take a look at the space (1.10) [0, ) := continuous functions w : [0, ) E . CE ∞ ∞ → Lemma 1.7 (Uniform convergence on compacta). Let (E, d) be metric space. Then the following conditions on functions wn, w E[0, ) are equivalent. ∈ C ∞ (a) For all T > 0,

(1.11) lim sup d(wn(t), w(t)) = 0. n t [0,T ] →∞ ∈ (b) For all (tn)n N,t [0, ) such that tn t, ∈ ∈ ∞ n−→→∞ (1.12) lim wn(tn)= w(t). n →∞

Proof. (a) (b). If tn t then there is a T > 0 such that tn,t T for all n. Now ⇒ → ≤ d w (t ), w(t) d w(t ), w(t) + d w (t ), w(t ) n n ≤ n n n n (1.13) dw(tn), w(t) + sup d wn(s), w(s) 0. ≤ s [0,T ] n−→→∞ ∈ by (a) and the continuity of w. (b) (a). Imagine that there exists a T > 0 such that ⇒ (1.14) lim sup sup d(wn(t), w(t)) = ε> 0. n t [0,T ] →∞ ∈ Then we can choose sn [0, T ] such that lim supn d(wn(sn), w(sn)) = ∈ →∞ ε. By the compactness of [0, T ] we can choose n1 < n2 < such that · · · ε limm snm = t for some t [0, T ] and d(wnm (snm ), w(snm )) for each →∞ ∈ ≥ 2 8 JAN SWART AND ANITA WINTER

m. Hence, d wnm (snm ), w(t) + d w(snm ), w(t) d(wnm (snm ), w(snm )) ε ≥ ≥ 2 . By continuity, d(w(snm ), w(t)) 0. We therefore ﬁnd that m−→→∞ ε (1.15) lim sup d(wn(sn), w(t)) n ≥ 2 →∞ which contradicts (1.12).

If wn, w are as in Lemma 1.7 then because of Property (a) we say that wn converges to w uniformly on compacta. Property (b) shows that this deﬁnition does not depend on the choice of the metric on E, i.e., if d and d˜are equivalent metrics on E then w w uniformly on compacta w.r.t. d if and n → only if wn w uniformly on compacta w.r.t. d˜. The topology on E[0, ) of uniform→ convergence on compacta is metrizable. A possible choiceC ∞ of a metric on E[0, ) generating the topology of uniform convergence on compacta is forC example:∞

∞ s (1.16) d (w , w ) := ds e− sup 1 d w (t s), w (t s) u.c. 1 2 ∧ 1 ∧ 2 ∧ Z0 t [0, ) ∈ ∞ Remark. If d is a metric on (E, ), then 1 d is also a metric, and both metrics are equivalent. O ∧

On E[0, ), we could also deﬁne uniform convergence on compacta as in LemmaD 1.7,∞ Property (a), but this topology would be too strong for our purposes. For example, if E = R, we would like the functions wn := 1[1+ 1 , ) n ∞ to approximate the function w := 1[1, ) as n , but supt [0,2] wn(t) ∞ → ∞ ∈ | − w(t) = 1 for each n. We wish to ﬁnd a topology on [0, ) such that | DE ∞ wn w whenever the jump times of the functions wn converge to the jump times→ of w while the “rest” of the paths converge uniformly in compacta. The main result of this section is that such a topology exists and has nice properties. Theorem 1.8 (Skorohod topology). Let (E, d) be a metric space. Then there exists a metric dd on [0, ) such that in this metric, [0, ) Sk DE ∞ DE ∞ is separable if E is separable, E[0, ) is complete if E is complete, and w w if and only if for all T D [0, ∞) there exists a sequence λ of strictly n → ∈ ∞ n increasing, continuous functions λn : [0, T ] [0, ) with λn(0) = 0, such that → ∞

(1.17) lim sup λn(t) t = 0 n t [0,T ] | − | →∞ ∈ and for (tn)n N,t [0, T ], ∈ ∈ w(t), whenever (tn) t, (1.18) lim wn(λn(tn)) = ↓ n w(t ), whenever (tn) t. →∞ − ↑ MARKOV PROCESSES 9

Remark. The idea of the functions λn in (1.17) and (1.18) is to make two functions w, w˜ close in the topology on E[0, ) if a small deformation of the time scale makes them near in the uniformD ∞ topology. The topology in Theorem 1.8 is called the Skorohod topology, after its inventor.

Our proof of Theorem 1.8 will follow Section 3.5 in [EK86]. Let Λ′ be the collection of strictly increasing functions λ mapping [0, ) onto [0, ). ∞ ∞ In particular, for all λ Λ′ we have λ(0) = 0, limt λ(t) = , and λ is continuous. Furthermore,∈ let Λ be the subclass of→∞ Lipschitz continuous∞ functions λ Λ such that ∈ ′ λ(t) λ(s) (1.19) λ := sup log − < . k k t s ∞ 0 s

Proof of Lemma 1.9. (i) For all λ Λ, ∈ λ(t) λ(s) λ = sup log − k k 0 s

(iii) Since for all λ Λ, ∈ λ λ 1 e−k k k k≥ − log λ(t)−λ(s) = sup 1 e−| t−s | (1.25) 0 s

(Counter-)Example. For n N, let ∈ n(n 2) 1 1 n2 −2 t, if t [0, 2 n2 ], − ∈ − 1 1 1 1 1 (1.27) λn(t) :=  nt + 2 (1 n), if t [ 2 n2 , 2 + n2 ],  − ∈ −  n(n 2) 2(n 1) 1 1 n2 −2 t n2−2 , t [ 2 + n2 , 1]. − − − ∈  Then (λn)n N in Λ, satisﬁes (1.22) but λn = n . ∈ k k n−→→∞ ∞

In analogy with (1.16), for v, w E[0, ), we deﬁne the Skorohod metric by ∈ D ∞ d dSk(v, w)

(1.28) s := inf λ ds e− sup 1 d v t s , w λ(t) s . λ Λ k k∨ ∧ ∧ ∧ ∈ Z[0, ) t [0, ) ∞ ∈ ∞ The next lemma states that dd is indeed a metric on [0, ). Sk DE ∞ Lemma 1.10. [0, ), dd is a metric space. DE ∞ Sk Proof. For symmetry recall Part (i) of Lemma 1.9, and notice that sup 1 d v t s , w λ(t) s t [0, ) ∧ ∧ ∧ (1.29) ∈ ∞ 1 = sup 1 d v λ− (t) s , w t s , t [0, ) ∧ ∧ ∧ ∈ ∞ for all λ Λ. This implies that d (v, w)= d (w, v) for all v, w [0, ). ∈ Sk Sk ∈ DE ∞ If dSk(v, w) = 0, then there exists a sequence (λn)n N in Λ such that ∈ λn 0 and k k n−→→∞ (1.30) ℓ s [0,s0]: sup 1 d v(t s), w(λn(t) s) ε 0 { ∈ t [0, ) ∧ ∧ ∧ ≥ } n−→→∞ ∈ ∞ MARKOV PROCESSES 11 for all ε> 0 and s0 [0, ). Hence by Part (iii) of Lemma 1.9 and (1.30), v(t)= w(t) for all continuity∈ ∞ points t of w, and therefore by Lemma 1.6 and right continuity of v and w, v = w. It remains to show the triangle inequality. Recall Part (ii) of Lemma 1.9, and notice that for all t [0, ), ∈ ∞

sup 1 d w t s , u t λ1 λ2(s) s [0, ) ∧ ∧ ∧ ◦ ∈ ∞ n o sup 1 d w t s , v t λ2(s) ≤ s [0, ) ∧ ∧ ∧ ∈ ∞ n o (1.31) + sup 1 d v t λ2(s) , u t λ1 λ2(s) s [0, ) ∧ ∧ ∧ ◦ ∈ ∞ n o = sup 1 d w t s , v t λ2(s) s [0, ) ∧ ∧ ∧ ∈ ∞ n o + sup 1 d v t s , u t λ1(s) . s [0, ) ∧ ∧ ∧ ∈ ∞ n o Combining (1.24) and (1.31) implies that d (w, u) d (w, v)+d (v, u). Sk ≤ Sk Sk

Exercise 1.11. For n N, let vn := 1[0,1 2−n) and wn := 1[0,2−n). Decide ∈ − d whether the sequences (vn)n N and (wn)n N converge in E[0, ), dSk and, if so, determine the limit function.∈ ∈ D ∞

Proposition 1.12 (A convergence criterion). Let (wn) E[0, ) and w [0, ). Then the following are equivalent: ∈ D ∞ ∈ DE ∞ d (a) dSk(wn, w) 0. n−→→∞ (b) There exists a sequence (λn)n N Λ such that λn 0 and ∈ ∈ k k n−→→∞

(1.32) lim sup d wn(λn(t)), w(t) = 0, n →∞ t [0,T ] ∈ for all T [0, ). ∈ ∞ (c) For each T > 0, there exists a sequence (λn)n N in Λ′ (possibly depending on T ) satisfying (1.26) and (1.32). ∈ (d) For each T > 0, there exists a sequence (λn)n N in Λ′ (possibly depending on T ) satisfying (1.17) and (1.18). ∈

Corollary 1.13. The Skorohod topology does not depend on the choice of the metric on (E, ). O 12 JAN SWART AND ANITA WINTER

˜ d Proof of Corollary 1.13. If d, d are two equivalent metrics on (E, ) and dSk d˜ O and dSk are the associated Skorohod metrics, then formula (1.18) shows that d˜ wn w in dSk if and only if wn w in dSk. It is easy to see that two n−→→∞ n−→→∞ metrics are equivalent if every sequence that converges in one metric also converges in the other metric, and vice versa.6

Proof of Proposition 1.12. (a) (b). We start showing that (a) is equiva- ⇐⇒d lent to (b). Assume ﬁrst that dSk(wn, w) 0 for a metric d on (E, ). By n−→→∞ O deﬁnition, then there exist sequences (λn)n N in Λ such that λn 0 ∈ k k n−→→∞ and

(1.33) ℓ s [0,s0]: sup 1 d wn(λn(t) s), w(t s) ε 0 { ∈ t [0, ) ∧ ∧ ∧ ≥ } n−→→∞ ∈ ∞ for all ε> 0 and s [0, ). 0 ∈ ∞ Hence, there is a subsequence (nk)k N such that d wn (λn (t) s), w(t ∈ k k ∧ ∧ s) 0 for almost every s [0, ), and thus for all continuity points s of k−→→∞ ∈ ∞ w. That is, there exist sequences (λn)n N in Λ and (sn)n N in [0, ) ∈ ∈ ↑ ∞ ∞ such that λn 0 and k k n−→→∞

(1.34) lim sup d wn(λn(t) sn), w(t sn) = 0. n →∞ t 0 ∧ ∧ ≥ Now for given T [0, ), sn T λn(T ) for all n suﬃciently large. There- fore (1.34) implies∈ (1.32).∞ ≥ ∨ On the other hand, let a sequence (λn)n N in Λ satisfy the condition of (b). Let s [0, ). Then for each n N, ∈ ∈ ∞ ∈

sup d wn(λn(t) s), w(t s) t 0 ∧ ∧ ≥ 1 = sup d wn(tn′ s), w(λn− (tn′ ) s) ′ t :=λn(t) 0 ∧ ∧ n ≥ (1.35) 1 sup d wn(tn′ s), w(λn− (tn′ s)) ≤ t′ 0 ∧ ∧ n≥ 1 1 + sup d w(λn− (tn′ s)), w(λn− (tn′ ) s) . ′ tn 0 ∧ ∧ ≥

6To see this, note that a set A is closed in the topology generated by a metric d if and only if x ∈ A for all xn ∈ A with xn → x in d. This shows that two metrics which deﬁne the same form of convergence have the same closed sets. Since open sets are the complements of closed sets, they also have the same open sets, i.e., they generate the same topology. MARKOV PROCESSES 13

We can estimate this further by

sup d wn(λn(r)), w(r) ≤ −1 ′ −1 r:=λn (tn s) [0,λn (s)] ∧ ∈ + sup d w(r), w(s) r:=λ−1(t′ ) [s,λ−1(s) s] ∨ n n ∈ n ∨ 1 sup d w(λn− (s)), w(r) , −1 ′ −1 r:=λn (tn) s [λn (s) s,s] ∧ ∈ ∧ where the second half of the last inequality follows by considering the cases t s and t >s separately. Thus by (1.32), n′ ≤ n′ (1.36) lim sup 1 d wn(λn(t) s), w(t s) = 0 n →∞ t [0, ) ∧ ∧ ∧ ∈ ∞ for every continuity point s of w. Hence, applying the dominated conver- d gence theorem in (1.28) yields that dSk wn, w 0. n−→→∞ (b) (c). Obviously, assumption (c) is weaker than (b) (recall also ⇐⇒ N (1.26)). To see the other direction, let N be a positive integer, and (λn )n N ∈ in Λ′ satisfying (1.26) with T = N and such that (1.37) λN (t) := λN (N)+ t N, t N. n n − ≥ We want to construct a sequence (λn)n N in Λ such that ∈ λn 0, and • k k n−→→∞ b supt [0,T ] d wn(λn(t)), w(t) 0, for all T [0, ). • b ∈ n−→→∞ ∈ ∞ Notice that by (1.32) we can find a subsequence (nk)k N such that b ∈ N 1 (1.38) sup d wn(λn (t)), w(t) t [0,N] ≤ N ∈ for all n nN , while in general, we can not conclude from (1.26) that ≥ N lim infn λn = 0 (recall the counterexample given behind the proof of Lemma→∞ 1.9).k k We proceed as follows. First we are therefore going to construct a se- N quence (λn )n N in Λ whose Lipschitz constant goes to 1 as N and ∈ N → ∞ N which is obtained by disturbing (λn )n N such that the dilatation of λn convergesc to zero as N but mildly∈ enough that we can ensure that N → ∞ supt [0,N] d wn(λn (t)), wn(λn(t)) 0. ∈ N−→→∞ For that, define τ N := 0, and for all k 1, c 0 ≥ N N 1 N N inf t>τk 1 : d w(t), w(τk 1) > N , if τk 1 < , (1.39) τk := { − − } N− ∞ , ifτk 1 = . ∞ − ∞ N Since w is right continuous, the sequence (τk )k N is strictly increasing as long as its terms remain finite. Since w has limits from∈ the left, the sequence has no cluster point. Now let for each n N, ∈ N N 1 N (1.40) sk,n := (λn )− (τk ), 14 JAN SWART AND ANITA WINTER where by convention (λN ) 1( )= . n − ∞ ∞ N Define a sequence (λn )n N in Λ by ∈ τ N τ N τ N + k+1− k (t sN ), if t [sN ,sN N), k c sN sN k,n k,n k+1,n k+1,n− k,n − ∈ ∧ N  (1.41) λn (t) := N  λn (N)+ t N, if t (N, ),  − ∈ ∞ c arbitrary, otherwise,  c where, by convention 1 = 1. With this convention and by (1.26),  ∞− ∞ N N N N N (1.42) λn = max log (τk+1 τk ) log (sk+1,n sk,n) 0, N n→∞ k k sk,n N − − − −→ ≤ and c

N 2 (1.43) sup d wn(λn (t)), wn(λn(t)) . t [0,N] ≤ N ∈ Since c (1.44) N sup d wn(λn (t)), w(t) t [0,N] ∈ c N N sup d wn(λn (t)), w(t) + sup d wn(λn (t)), wn(λn(t)) ≤ t [0,N] t [0,N] ∈ ∈ N 2 c sup d wn(λn (t)), w(t) + , ≤ t [0,N] N ∈ for all n N, (1.18) implies that we can choose a subsequence (nk)k N such ∈ ∈ N 1 N 3 that λn N and supt [0,N] d wn(λn (t)), w(t) N for all n nN . For k k≤ ∈ ≤ ≥ N 1 n 0 andfλn Λ′ satisfying (1.26). Definew ñ(t) := wn(λn(t)) (t [0, T ]). Then we must∈ show that the following conditions are equivalent ∈ (i) lim sup d(w ñ(t), w(t)) = 0. n t [0,T ] →∞ ∈ w(t) whenever tn t (ii) lim wñ(tn)= ↓ , tn,t [0, T ]. n w(t ) whenever tn t ∈ →∞ − ↑ This is very similar to the proof of Lemma 1.7, with wn replaced byw ñ. The implication (i) (ii) can be proved as in (1.13) using the facts that w(tn) w(t) if t t and⇒ w(t ) w(t ) if t t. To prove the implication (ii) (i)→ n ↓ n → − n ↑ ⇒ we assume that (i) does not hold and show that there exist n1 < n2 < · ·ε · such that limm snm = t for some t [0, T ] and d(w ñm (snm ), w(snm )) →∞ ∈ ≥ 2 for each m. Since either snm >t infinitely often, or snm

We next state that if the underlying space (E, ) is Polish then E[0, ) is Polish. O D ∞

Proposition 1.14 (Andrei N. Kolmogorov [Kol56]). If (E, ) is separable, d O d then E[0, ), dSk is separable. If (E, d) is complete, then E[0, ), dSk is complete.D ∞ D ∞

Remark. If E is Polish then [0, ) is separable and we can choose d such DE ∞ that E is complete in d, hence E[0, ) is complete in dSk, hence E[0, ) is Polish. D ∞ D ∞

We prepare the proof with the following problem:

Exercise 1.15. Let (E, ) be a separable topological space, and (αn)n N a countable dense subset ofOE. Show that the collection Γ of all functions∈ of the form

α , t [t ,t ), (1.45) w(t) := nk k 1 k α , t∈ [t −, ), nK ∈ K ∞ where 0= t0 k such that ≥ k ∈ k k (1.47) λk sup 1 d wNk (λk(t) sk), wNk+1 (t sk) 2− . k k∨ t [0, ) ∧ ∧ ∧ ≤ ∈ ∞ Let then

(1.48) µk := lim λk+n λk+1 λk, n →∞ ◦···◦ ◦ and notice that µk exists uniformly on bounded intervals, is Lipschitz continuous and satisﬁes

∞ k+1 (1.49) µ λ 2− , k k k≤ k l k≤ Xl=k 16 JAN SWART AND ANITA WINTER and hence, in particular, belongs to Λ. Since by (1.47), for all k 1, ≥ 1 1 sup 1 d wNk (µk− (t) sk), wNk+1 (µk−+1(t) sk) t [0, ) ∧ ∧ ∧ ∈ ∞ 1 1 = sup 1 d w (µ− (t) s ), w (λ (µ− )(t) s ) ∧ Nk k ∧ k Nk+1 k k ∧ k (1.50) t [0, ) ∈ ∞ = sup 1 d wNk (t sk), wNk+1 (λk(t) sk) t [0, ) ∧ ∧ ∧ ∈ ∞ k 2− , ≤ 1 completeness of E implies that uk := wNk µk− converges uniformly on bounded intervals to a function w : [0, ) ◦ E. Moreover, since u ∞ → k ∈ E[0, ), for all k 1, also w E[0, ). Therefore (wNk )k N and w satisfyD ∞ the conditions≥ of part (b) of∈ Proposition D ∞ 1.12, and hence we∈ conclude d that dSk(wNk , w) 0. k−→→∞

Let denote the Borel σ-algebra on [0, ), dd . Since we are going SE DE ∞ Sk to talk about probability measures on [0, ), it is important to E E know more about . D ∞ S SE The following result states that E is just the σ-algebra generated by the coordinate variables. S Proposition 1.16 (Borel σ-field). If (E, ) is Polish, then the Borel-σ-field on [0, ) coincides with the σ-field generatedO by the coordinate projec- DE ∞ tions (ξt)t 0, defined as ≥ (1.51) ξ : [0, ) w w(t), t 0. t DE ∞ ∋ 7→ ≥ coor Proof. Let E denote the σ-algebra generated by the coordinate maps, i.e., S (1.52) coor := σ(ξ : t [0, )). SE t ∈ ∞ coor We start showing that E E. For a given ε> 0, t [0, ) and f a bounded continuous functionS on⊆E S, consider the following∈ map:∞ 1 t+ε (1.53) f ε : [0, ) w dsf(ξ (w)) R. t DE ∞ ∋ 7→ ε s ∈ Zt ε It is easy to check that ft is continuous on E[0, ), and hence Borel ε D ∞ measurable. Moreover, since limε 0 ft = f ξt, we find that f ξt is Borel measurable for every bounded and↓ continuous◦ function f, and hence◦ also for all bounded functions f. Consequently, 1 (1.54) ξ− (Γ) := w [0, ) : ξ (w) Γ , Γ (E). t { ∈ DE ∞ t ∈ } ∈ SE ∈B coor That is, E E. To prepareS ⊆ the S other direction, notice first that if D [0, ) is dense then ⊆ ∞ (1.55) coor = σ(ξ : t D). SE t ∈ MARKOV PROCESSES 17

Indeed, for each t [0, ), there exists a sequence (tn)n N in D [t, ) with ∈ ∞ ∈ ∩ ∞ (tn) t, as n . Therefore, ξt = limn ξtn is σ(ξt : t D)-measurable. ↓ →∞ →∞ ∈ Assume now that (E, ) is separable. Fix n N and 0 =: t0

If E is a metrizable space, then we denote the space of continuous functions w : [0, ) E by [0, ). ∞ → CE ∞ Lemma 1.17 (Continuous functions). The space [0, ) is a closed subset CE ∞ of E[0, ). The induced topology on E[0, ) is the topology of uniform convergenceD ∞ on compact sets. C ∞

Proof. For closedness, let (wn)n N be a sequence of functions in E[0, ), ∈ C ∞ and w E[0, ) such that dSk(wn, w) 0. We have to show that w ∈ D ∞ n−→→∞ ∈ E[0, ). By condition (c) of Proposition 1.12, for all T [0, ), there C ∞ T ∈ ∞ exists a sequence (λn )n N in Λ′ satisfying (1.26) and (1.32). Hence, for all T [0, ) and ε> 0, ∈ ∈ ∞ by (1.32), there exists N = N(T, ε) such that for all n N and • t [0, T ], λ (t) t < ε, and ≥ ∈ | n − | by continuity of wn, there exists δ = δ(ε) > 0 such that for all • s,t [0, T ] with s t < δ, d(w (t), w (s)) < ε. ∈ | − | n n 18 JAN SWART AND ANITA WINTER

Combining both yields that for all n N(T, δ(ε)) and t [0, T ], ≥ ∈ (1.61) d(wn(t), wn(λn(t))) < ε. Thus, (1.61) together with (1.26) implies

sup d(wn(t), w(t)) t [0,T ] ∈ (1.62) sup d(wn(λn(t)), wn(t))+ sup d(wn(λn(t)), w(t)) ≤ t [0,T ] t [0,T ] ∈ ∈ 0. n−→→∞ This is equivalent to uniform convergence of (wn)n N against w on compacta. In particular, the limit function w is continuous. ∈

The next lemma shows that stochastic processes with cadlag sample paths are just random variables with values in a rather large and complicated space. Lemma 1.18 (Processes with cadlag sample paths). A function (t,ω) 7→ Xt(ω) is a stochastic process with Polish state space E and cadlag sample paths if and only if ω (Xt(ω))t 0 is a E[0, )-valued random vari- 7→ ≥ D ∞ able. Two E-valued stochastic processes X and X˜ with cadlag sample paths have the same finite dimensional distributions if and only if, considered as [0, )-valued random variables, they have the same laws (X) and (X˜). DE ∞ L L Proof. Let X :Ω E[0, ) denote the function ω X(ω) := (Xt(ω))t 0. By Proposition 1.16,→ D the Borel-∞ σ-field on [0, )7→ is generated by the≥ co- DE ∞ ordinate projections (ξt)t 0. Therefore, the function X is measurable if t 1 ≥ t 1 and only if X− (ξt− (A)) for all A (E). Since X− (ξt− (A)) = 1 1 ∈ F ∈ B (ξt X)− (A)= Xt− (A) this is equivalent to the statement that the (Xt)t 0 are◦ random variables. ≥ The finite dimensional distributions of an E-valued stochastic process X are uniquely determined by all probabilities of the form (1.63) P X A ,...,X A { t1 ∈ 1 tn ∈ n} with 0 t1 tn and A1,...,An (E). The class of all subsets of [0, ≤) of≤···≤ the form w [0, ) :∈w B A , . . . , w A is closed DE ∞ { ∈ DE ∞ t1 ∈ 1 tn ∈ n} under finite intersections and generates the Borel-σ-field on E[0, ), so the probabilities of the form (1.63) uniquely determine the lawD (X)∞ of X, considered as [0, )-valued random variable. L DE ∞

1.4. Compactification of Polish spaces. In this section we collect some important facts about Polish spaces that will be useful later on. In particular, we will see that every Polish space can be embedded in a compact space. Compact metrizable spaces are, in a sense, the “nicest” topological spaces. A countable product i N Ei of compact metrizable spaces, equipped with ∈ Q MARKOV PROCESSES 19 the product topology, is compact and metrizable [Kel55, Theorem 4.14].7 Every compact metrizable space is separable.8 Conversely, every separable metrizable space can be embedded in a compact metrizable space. Definition 1.19. By definition, a compactification of a topological space E is a compact topological space E such that E E, the topology on E is the induced topology from E, and E is the closure⊆ of E.

Proof. Notice that if E is a compactiﬁcation of E then E is compact and only if E = E

The next proposition can be found in [Kel55, Theorem 4.17] or [Cho69, Theorem 6.3]. Proposition 1.20 (Metrizable compactiﬁcations). Every separable metrizable space E has a metrizable compactiﬁcation E.

Deﬁnition 1.21 (Product topology). Let ((Ek, k))k N be metrizable topo- O ∈ logical spaces. The product topology on ∞ E is the roughest topology O k=1 k on ∞ Ek such that all projections π from E Ei are continuous. k=1 Q → Q Remark. Let ((Ek, dk))k N be metric spaces. Then the product topology ∈ O on k∞=1 Ek can be metrized by

Q ∞ k (1.64) d(x,y) := 2− 1 d (x ,y ) ∧ k k k kY=1 for all x := (x1,x2,...) and y := (y1,y2,...) in E.

Proof. (Sketch) Equip [0, 1]N with the product topology. Then [0, 1]N is compact and metrizable. Using Urysohn’s lemma, it can be shown that there exist a countable family (fi)i N of continuous functions fi : E [0, 1] such N ∈ → that the map f : E [0, 1] deﬁned by f(x) := (fi(x))i N is open and one- to-one. Since f is obviously→ continuous, it follows that f is∈ a homeomorphism between E and f(E). Identifying E with its image f(E) and taking for E the closure of f(E) in [0, 1]N we obtain the required compactiﬁcation.

Unfortunately, for general separable metrizable spaces, E may be a very ‘bad’ (even non-measurable) subset of its compactiﬁcation E. For Polish spaces, and in particular for locally compact spaces, the situation is better. In what follows, all spaces are separable and metrizable.

7Uncountable products of compact metrizable spaces are still compact but no longer metrizable. 8This follows from the fact that for a metric space compact ⇒ totally bounded ⇒ countable basis for the topology ⇒ separable. 20 JAN SWART AND ANITA WINTER

Deﬁnition 1.22 (Locally compact). We say that E is locally compact if for each x E there exists an open set O and a compact set C such that x O C.∈ ∈ ⊂ Exercise 1.23. Let E be a locally compact space. Show that E is separable, and that there exists compact sets (Ci)i N such that E = i N Ci. ∈ ∈ We need the following facts. S Proposition 1.24 (Subsets of locally compact and Polish spaces). (i) A subset F of a locally compact space E is itself locally compact in the induced topology if and only if F E is the intersection of an open set with a closed set. ⊂ (ii) A subset F of a Polish space E is itself Polish in the induced topology if and only if F E is a countable intersection of open sets. ⊂ Proof. For Part (i), see [Bou64, 8.16]. For Part (ii), see [Bou58, Section 6, Theorem 1]. §

Remark. Sets that are the intersection of an open set with a closed set are called locally closed. Sets that are a countable intersection of open sets are 9 called Gδ-sets. Every closed set is a Gδ-set.

The following is a immediate consequence of Proposition 1.24. Corollary 1.25. Let E be a separable metrizable space and E is a metrizable compactiﬁcation of E. Then E is locally compact if and only if E is an open subset of E. • E is Polish if and only if E is a countable intersection of open sets • in E.

Exercise 1.26. Prove Corollary 1.25. In particular, (1.65) E compact E locally compact E Polish.10 ⇒ ⇒ If E is locally compact but not compact, then there exists a metrizable compactification E of E such that E E consists of one point (usually de- \ noted by ). In this case, E is the set ∞ (1.66) E∞ := E ∪ {∞} and by definition a subset U E¯ is open if either ⊆ 9 1 If (E,d) is a metric space and A ⊆ E is closed, then On := {x ∈ E : d(x,A) < n } are open sets with A = Tn On. 10If E is a separable metrizable space and there exists a metrizable compactification E of E such that E ⊂ E is a Borel measurable set, an analytic set, or a universally measurable set, then E is called a Lusin space, a Souslin space, or a Radon space, respectively. MARKOV PROCESSES 21

U and U is open in the original topology. • ∞ 6∈ U and E¯ U is compact in the original topology of E. • ∞∈ \ We call E∞ the one-point compactification of E. As an application of Proposition 1.24, we prove: Proposition 1.27 (Product spaces). (i) A finite product E E of locally compact spaces is locally 1 ×···× n compact but a countably infinite product i N Ei is not, unless all ∈ but finitely many Ei are compact. Q (ii) A countable product i N Ei of Polish spaces is Polish. ∈ Q Proof. Let E (i N) be locally compact spaces and let E be metrizable i ∈ i compactifications of the Ei. Then i N Ei is a metrizable compactification ∈ of i N Ei. Let πi denote the projection on Ei. If all but finitely many Ei ∈ Q are compact, then there is an n such that Ei = Ei for all i>n. Therefore Q n 1 i N Ei = i=1 πi− (Ei) is an open subset of i N Ei, hence i N Ei is ∈ ∈ ∈ locally compact. If there are infinitely many Eik that are not compact, then Q T (k) Q (kQ) choose x = (xi)i N i N Ei and x i N Ei such that xi = xi for ∈ ∈ ∈ (k) ∈ (k) ∈ (k) all i = ik and xi Eik Eik . Then x i N Ei and x x in the 6 k ∈ Q \ Q6∈ ∈ → product topology, which proves that i N Ei i N Ei is not closed, hence ∈ \Q ∈ i N Ei is not open, hence i N Ei is not locally compact. ∈ ∈ Q Q If the Ei are Polish, then each Ei is a countable intersection of open Q Q i subsets of Ei, say Ei = j Oij. Then i N Ei = i,j πi− (Oij) is a countable ∈ intersection of open subsets of i N Ei, hence i N Ei is Polish. T ∈ Q T∈ Q Q Definition 1.28 (Separating points). We say that a family (fi)i I of functions on a space E separates points if for each x = y there exists∈ an i I such that f (x) = f (y). 6 ∈ i 6 i The next (deep) result is often very useful.

Proposition 1.29 (Borel σ-ﬁeld). Let E, (Ei)i N be Polish spaces and let ∈ (fi)i N be a countable family of measurable functions fi : E Ei that separates∈ points. Then σ(f : i N)= (E). → i ∈ B Proof. See [Sch73, Lemma II.18]. Warning: the statement is false for uncountable families (fi)i I . For example, if E = [0, 1], then the functions ∈ (1 x )x [0,1] separate points, but they generate the σ-ﬁeld := A [0, 1] : { } ∈ S { ⊂ A countable or [0, 1] A countable . \ } A simple application is:

Corollary 1.30 (Product σ-field). If (Ei)i N are Polish spaces, then the ∈ Borel-σ-field ( i N Ei) coincides with the product-σ-field i N (Ei). B ∈ ∈ B Q Q 22 JAN SWART AND ANITA WINTER

Proof. Let πi denote the projection on Ei. Then the functions (πi)i N are continuous (hence certainly measurable) and separate points. ∈

Note that Proposition 1.29 also implies that if E is Polish, then the Borel- σ-ﬁeld on E[0, ) coincides with the σ-ﬁeld generated by the coordinate projectionsD ξ : ∞t Q [0, ) . This strengthens Proposition 1.16! { t ∈ ∩ ∞ } MARKOV PROCESSES 23

2. Markov processes In the previous section, we have studied stochastic processes in general, and stochastic processes with cadlag sample paths in particular. In the present section we take a look at a special class of stochastic processes, namely those which have the Markov property, and in particular at those whose transition probabilities are time-homogeneous. We will see how such time-homogeneous transition probabilities can be interpreted as semigroups. In the next sections we will then see how a certain type of these semigroups, namely those which have the Feller property, may be constructed from their generators, and how such semigroups give rise to Markov processes with cadlag sample paths. 2.1. The Markov property. We start by recalling the notion of conditional expectation. Let (Ω, , P) be our underlying probability space. For any σ-field , let F H (2.1) B( ) := f :Ω R : f -measurable and bounded . H { → H } Definition 2.1. The conditional expectation of a random variable F B( ) given , denoted by E [F ] or E[F ], is a random variable such that∈ F H H |H (1) E (F ) B( ), (2.2) H ∈ H (2) E[E (F )H]= E[FH] H B( ). H ∀ ∈ H The random variable E [F ] is almost surely defined through these two conditions (with respect toH the restriction of P to ). Some elementary properties of the conditional expectation are: H

(“continuity”) E (Fi) E (F ) a.s. Fi F, H ↑ H ∀ ↑ (2.3) (“projection”) E [E [F ]] = E [F ] a.s. , G H G ∀G ⊂H (“pull out”) E (F )H = E (FH) a.s. H B( ). H H ∀ ∈ H We write P(A ) := E [1A] (A ) and for any random variable G we abbreviate E [|HF ]= E[FHG] := E[∈F Fσ(G)] and P(A G) := P(A σ(G)). G | | | | Proof of the pull out property. We need to check that E [F ]H satisﬁes (2.2). Indeed, E [F ]H B( ) since E [F ] B( ) and H HB( ), and applying H ∈ H H ∈ H ∈ H (2.2) (2) twice we see that E[E (F )HH′]= E[FHH′]= E[E [FH]H′] for all H B( ), which shows thatH E [F ]H satisﬁes (2.2) (2). H ∈ H H

Lemma 2.2 (Conditional expectation). It suﬃces to check (2.2) (2) for H of the form H = 1A with A , where is closed under ﬁnite intersections, there exists A such that∈A G Ω, andG σ( )= . i ∈ G i ↑ G H Before we prove Lemma 2.2, we recall a basic fact from measure theory. A subset of the set of all subsets of Ω is called a Dynkin system if D (1) Ω , (2) A,∈ B D , A B A B , and ∈ D ⊇ ⇒ \ ∈ D 24 JAN SWART AND ANITA WINTER

(3) A , A A A . n ∈ D n ↑ ⇒ ∈ D Lemma 2.3. Let be a collection of subsets of Ω which is closed under ﬁnite intersections.C Then the smallest Dynkin system which contains is equal to σ( ). C C

Proof. See any book on measure theory.

Proof of Lemma 2.2. Set := A : E[E (F )1A] = E[F 1A] . By the linearity and continuity ofD the conditional{ ∈ H expectation,H A, B , A} B ∈ D ⊇ ⇒ A B and An , An A A . Since we are assuming that there\ ∈ exists D A ∈such D that↑A ⇒Ω, we also∈ D have Ω , so is a Dynkin i ∈ G i ↑ ∈ D D system. Therefore, by Lemma 2.3, E[E (F )1A] = E[F 1A] for all A H ∈ G implies E[E (F )1A]= E[F 1A] for all A . The general statement follows by approximationH with simple functions,∈H using the linearity and continuity of E . H

Example. Let Z be uniformly distributed on [0, 1], X := cos (2πZ) and Y := sin (2πZ). Then(X,Y ) is uniform distributed on (x,y) : x2+y2 = 1 , and hence a version of the conditioned distribution of {X given Y is } 1 1 (2.4) P ( X C Y )= δ√1 Y 2 (C)+ δ √1 Y 2 (C). { ∈ }| 2 − 2 − − Moreover, we ﬁnd E [X] =0, and E [X2] = 1 Y 2. Y Y − Let X be a stochastic process with values in a Polish space (E, ). For each t 0, we introduce the σ-ﬁelds O ≥ (2.5) X := σ X ; 0 s t), Ft s ≤ ≤ and (2.6) X := σ X ; t u). Gt u ≤ X Note that t is the collection of events that refer to the behavior of the F X process X up to time t. That is, t contains all “information” that can be F X obtained by observing the process X up to time t. Likewise, t contains all information that can be obtained by observing the process XG after time t. Proposition 2.4 (Markov property). The following four conditions on X are equivalent. (a) For all A X , B X , and t 0, ∈ Ft ∈ Gt ≥ (2.7) P(A B X )= P(A X )P(B X ) a.s., ∩ | t | t | t (b) For all B X , t 0, ∈ Gt ≥ (2.8) P(B X )= P(B X ) a.s., | t |Ft MARKOV PROCESSES 25

(c) For all C (E), and 0 s t, ∈B ≤ ≤ (2.9) P( X C X )= P( X C X ) a.s., { t ∈ }|Fs { t ∈ }| s (d) For all C ,C ,... (E), and 0 t t , 1 2 ∈B ≤ 1 ≤···≤ n P Xt1 C1,...,Xtn Cn (2.10) { ∈ ∈ } = E 1 Xt C1 EXt 1 Xt C2 EXt [ EXt [1 Xt Cn . { 1 ∈ } 1 { 2 ∈ } 2 · · · n−1 { n ∈ } Remark. If X satisﬁesh the equivalent conditions from Proposition 2.4i then we say that X has the Markov property. Note that condition (a) says that the future and the past are conditionally independent given the present. Condition (b) says that the behavior of X after time t depends only on the behavior of X before time t through the state of X at time t.

Exercise 2.5 (Gaussian processes with Markov property). A stochastic process X := (Xt)t [0, ) is called Gaussian process if for all n N, and ∈ n ∞ t ∈ (t1,...,tn) [0, ) , the random vector (Xt1 ,...,Xtn ) is normal distributed with∈ mean∞(µ ,...,µ ) Rn and covariance function Γ(s,t) := t1 tn ∈ E[(Xs µs)(Xt µt)]. Show− that a centered− (i.e., µ 0) Gaussian process has the Markov property if and only if for all s, t, u ≡ [0, ) with s < t < u, ∈ ∞ (2.11) Γ(s, u)Γ(t,t)=Γ(s,t)Γ(t, u).

E P(A Xs)P(B Xs) = E P(A B Xs) (2.14) | | ∩ | = P A B . ∩ (b) (c): This follows by applying (2.8) to t := s and B := X C X . ⇒ { t ∈ } ∈ Gs (c) (d): Since t1 t2 . . ., repeated use of the projection property and the⇒ pull out propertyF ⊆ Fgive⊆

P Xt1 C1,...,Xtn Cn = E[1 Xt C1 1 Xt Cn ] { ∈ ∈ } { 1 ∈ } · · · { n ∈ } (2.15) X X X = E 1 Xt C1 E 1 Xt C2 E [ E [1 Xtn Cn , { 1 ∈ } Ft1 { 2 ∈ } Ft2 · · · Ftn−1 { ∈ } and the right handh side of (2.15) equals the right hand side of (2.10) byi (b). 26 JAN SWART AND ANITA WINTER

(d) (c): By approximation with simple functions it follows from (d) that for⇒ any 0 t t and F B(σ(X ),...,F B(σ(X ), ≤ 1 ≤···≤ n 1 ∈ t1 n ∈ tn

E[F1 Fn] (2.16) · · · = E F1EX F2EX EX [Fn] . t1 t2 · · · tn−1 h i Let 0 s s = s t, C ,...,C ,C (E). Using (c) and ≤ 1 ≤ ··· ≤ m ≤ 1 m ∈ B applying (2.16) to n = m 1 and Fn = 1 Xsm Cm EXs [1 Xt C ], we ﬁnd that − { ∈ } { ∈ }

E 1 Xs C1,...,Xs Cm 1 Xt C { 1 ∈ m ∈ } { ∈ } (2.17) = E 1 Xs C1 EXs EXs 1 Xs Cm EXs [1 Xt C ] { 1 ∈ } 1 · · · m−1 { m ∈ } { ∈ } h i = E 1 Xs C1,...,Xs Cm EXs [1 X t C ] . { 1 ∈ m ∈ } { ∈ } It follows from Lemma 2.2 that

(2.18) EXs [1 Xt C ]= E X [1 Xt C ]. { ∈ } Fs { ∈ }

(c) (b): By approximation with simple functions it follows from (c) that for⇒ all F B(σ(X )) and 0 s t, ∈ t ≤ ≤ (2.19) E[F X ]= E[F X ] a.s. |Fs | s

Let 0 t u1 um and C1,...,Cm (E). Then repeated use of the projection≤ ≤ property,≤···≤ the pull out property,∈B and (2.19) give

X E 1 Xu C1,...,Xum Cm Ft { 1 ∈ ∈ } X X X (2.20) = E 1 Xu C1 E E [1 Xum Cm ] Ft { 1 ∈ } Fu1 · · · Fum−1 { ∈ } = EXt 1 Xu C1 EXu EXu [1 Xu Cm ] . { 1 ∈ } 1 · · · m−1 { m ∈ } In the last step we have applied (2.19) ﬁrst to 1 Xu Cm B(σ(Xun )), { m ∈ } ∈ then to 1 Xu Cm EXu [1 Xu Cm ] B(σ(Xun−1 )), and so on. It { m−1 ∈ } m−1 { m ∈ } ∈ follows that E X 1 is σ(Xt)-measurable. Therefore, by t Xu1 C1,...,Xum Cm the projectionF property,{ ∈ ∈ }

EXt 1 Xu C1,...,Xu Cm { 1 ∈ m ∈ }

(2.21) X = EXt E 1 Xu C1,...,Xum Cm Ft { 1 ∈ ∈ } Xh i = E 1 Xu C1,...,Xum Cm . Ft { 1 ∈ ∈ } The class of all sets A such that EX [1A]= E X [1A] forms a Dynkin system, t t so by Lemma 2.3 we arrive at (b). F MARKOV PROCESSES 27

2.2. Transition probabilities. Let E, F be Polish spaces. By definition, a probability kernel from E to F is a function K : E (F ) [0, 1] such that ×B → (1) For fixed x E, K(x, ) is a probability measure on F . (2) For fixed A∈ (F ), K·( , A) is a measurable function on E. ∈B · If E = F then we say that K is a probability kernel on E.

Example. For all x R and A (R), set ∈ ∈B 1 (x y)2 (2.23) K x, A := dy exp − . √2π − 2 ZA Then K is a probability kernel on R.

There is another way of looking at probability kernels that is often very useful. For any Polish space E we define (2.24) B(E) := f : E R : f Borel measurable and bounded . { → } Lemma 2.6 (Probability kernels). If K is a probability kernel from E to F then the operator K : B(F ) B(E) defined by → (2.25) Kf(x) := K(x, dy) f(y), x E, f (F ), ∈ ∈B ZF satisfies (1) K is conservative, i.e., K1 = 1. (2) K is positive, i.e., Kf 0 for all f 0. ≥ ≥ (3) K is linear, i.e., K(λ1f1+λ2f2)= λ1K(f1)+λ2K(f2) for all f1,f2 B(F ) and λ , λ R. ∈ 1 2 ∈ (4) K is continuous with respect to monotone sequences, i.e., K(fi) K(f) for all f f, f ,f B(F ). ↑ i ↑ i ∈ Conversely, every operator K : B(F ) B(E) with these properties corresponds to a probability kernel from E to→F as in (2.25). 28 JAN SWART AND ANITA WINTER

Proof. If K is a probability kernel from E to F then the operator K : B(F ) B(E) deﬁned in (2.25) maps B(F ) into B(E) since K( , A) is measurable→ for each A (F ), and the operator K has the properties· (1)– (4)) since K(x, ) is a probability∈B measure for each x E. Conversely, if · ∈ K : B(F ) B(E) satisﬁes (1)–(4) then K(x, A) := K1A(x) is measurable as a function→ of x for each A (F ) since the operator K maps B(F ) into B(E) and K(x, ) is a probability∈B measure by (1)–(4). · Remark. If E is a set consisting of one point, say E = 0 , then a probability kernel from E to F is just a probability measure K(0{ ,}) = µ, say. In this case B(E) is isomorphic to R and the operator in (2.25),· considered as an operator from B(F ) to R, is given by

(2.26) µf := µ(dy)f(y), f (F ). ∈B ZF

If E, F , and G are Polish spaces, K is a probability kernel from E to F , and L is a probability kernel from F to G, then the composition of the operators L : B(G) B(F ) and K : B(F ) B(E) yields an operator KL : B(G) B(E)→ that corresponds to the composite→ kernel KL from E to G given by→

(2.27) (KL)(x, A) := K(x, dy)L(y, A) (x E, A (G)). ∈ ∈B ZF The following result states that conditional probabilities of random variables with values in Polish spaces are associated with probability kernels. Proposition 2.7 (Conditional probability kernel). Let X,Y be random variables with values in Polish spaces E and F , respectively. Then there exists a probability kernel P from E to F such that for all A (F ), ∈B P Y A X = P (X, A), a.s. { ∈ | } The kernel P is unique up to a.s. equality with respect to (X). L Proof (sketch). Let (F ) be the space of finite measures on F , equipped with σ-field generatedM by the mappings µ µ(A) with A (F ). Define a function M : (E) (F ) by 7→ ∈B B → M (2.28) M(B)(A) := P X B, Y A . { ∈ ∈ } Then M( ) = 0, the zero measure, and M is σ-additive, so we may interpret M as a measure∅ on (E, (E)) with values in (F ). Moreover, P X B = B M { ∈ } 0 implies M(B) = 0, so M is absolutely continuous with respect to PX , the law of X. It follows from the fact that F is a Polish space that the space (F ) has the Radon-Nikodym property, i.e., the Radon-Nikodym theorem Malso holds for (F )-valued measures and functions. As a result, there exists M MARKOV PROCESSES 29 a (F )-valued measurable function x P (x, ) from E to (F ), unique M 7→ · M up to a.s. equality with respect to PX , such that

(2.29) M(B)= P (x, )P (dx). · X ZB It is not hard to check that a function x P (x, ) from E to (F ) is measurable if and only if P is a probability kernel7→ from· E to F . NowM (2.29) says that (2.30) E[P (X, A)1 ]= P (x, A)P (dx)= M(B)(A)= P X B, Y A , B X { ∈ ∈ } ZB which is equivalent to (2.7).

Remark. Proposition 2.7 remains true if only F is Polish and E is any measurable space.

It follows from Proposition 2.7 that for any stochastic process X in E there exist probability kernels (Ps,t)0 s t on E such that for all A (E), and 0 s t, ≤ ≤ ∈B ≤ ≤ (2.31) P X A X = P (X , A), a.s. { t ∈ | s} s,t s We call (Ps,t)0 s t the transition probabilities of X. ≤ ≤ Proposition 2.8 (Markov transition probabilities). Let X be a stochastic process with values in E and let (Ps,t)0 s t be probability kernels on E. Then the following conditions are equivalent:≤ ≤ (a) For all C (E) and 0 s t, ∈B ≤ ≤ (2.32) P( X C X )= P (X ,C), a.s. { t ∈ }|Fs s,t s (b) X has the Markov property, and for all C (E) and 0 s t, ∈B ≤ ≤ (2.33) P( X C X )= P (X ,C), a.s. { t ∈ }| s s,t s (c) For all C1,...,Cn (E) and 0= t0 t1 tn, (2.34) ∈B ≤ ≤···≤ P X C ,...,X C { t1 ∈ 1 tn ∈ n}

= P X0 dx0 Pt0,t1 (x0, dx1) Ptn−1,tn (xn 1, dxn). { ∈ } · · · − Z ZC1 ZCn

Proof. (a) (b): It follows from (a) that P( X C X ) is measur- ⇒ { t ∈ }|Fs able with respect to σ(Xt), and therefore P( Xt C Xs) = E[P( Xt X X { ∈ }| { ∈ C s ) Xs] = P( Xt C s ), a.s. By condition (c) of Proposition 2.4, X}|Fhas the| Markov{ property.∈ }|F (b) (a): Since X has the Markov property, by condition (c) of Proposi- ⇒ X X tion 2.4, E[P( Xt C s ) Xt] = P( Xt C Xs) = P( Xt C s ), a.s. { ∈ }|F | { ∈ }| { ∈ }|F 30 JAN SWART AND ANITA WINTER

(b) (c): Since X has the Markov property, X satisfies condition (d) of ⇒ Proposition 2.4. Using the fact that P( Xt C Xs) = Ps,t(Xs,C), a.s., we arrive at (c). { ∈ }| (c) (b): We start by showing that for all C (E) and 0 s t), ⇒ ∈B ≤ ≤ (2.35) P( X C X )= P (X ,C), { t ∈ }| s s,t s where (Ps,t)0 s t are the probability kernels in (c). Since Ps,t(Xs,C) is ≤ ≤ measurable with respect to σ(Xs), by the definition of the conditional probability, it suffices to show that E[Ps,t(Xs,C)1 Xs B ]= P Xs B, Xt C { ∈ } { ∈ ∈ } for all B,C (E). Indeed, by (c), (2.36) ∈B

E[Ps,t(Xs,C) 1 Xs B ]= P X0 dx0 P0,s(x0, dx1)Ps,t(x1,C) } { ∈ } { ∈ } Z ZB = P X B, X C . { s ∈ t ∈ } This proves (2.35), i.e., the (Ps,t)0 s t are the transition probabilities of X. It follows that X satisﬁes condition≤ ≤ (d) from Proposition 2.4, so X has the Markov property.

2.3. Transition functions and Markov semigroups. Condition (c) of Proposition 2.8 shows that the finite dimensional distributions of a process X with the Markov probability are uniquely determined by its transition probabilities (Ps,t)0 s t and its initial law (X0). We will mainly be interested in the case that≤ ≤ the transition probabilitiesL can be chosen in such a way that P is a function of t s only. This leads to the following definition. s,t − Recall that the delta-measure δx in a point x is defined as 1, x A, (2.37) δ (A)= x 0, x ∈ A. 6∈ Definition 2.9 (Transition function). By definition, a transition function on E is a collection (Pt)t 0 of probability kernels on E such that ≥ (1) (Initial law) For all x E, ∈ (2.38) P (x, ) := δ , 0 · x (2) (Chapman-Kolmogorov equation) For all x E, C (E), and 0 s t, ∈ ∈ B ≤ ≤ (2.39) Ps(x, dy)Pt(y, A)= Ps+t(x, A). Z We make the following observation.

Lemma 2.10 (Markov semigroups). A collection (Pt)t 0 of probability kernels on E is a transition function if and only if the≥ associated operators P : B(E) B(E) deﬁned by t → (2.40) P f(x) := P (x, dy)f(y) x E, f B(E) t t ∈ ∈ ZE MARKOV PROCESSES 31 satisfy

(1) P0f = f (f B(E)), (2) P P = P ∈(s,t 0). s t s+t ≥ Properties (1) and (2) from Lemma 2.10 say that the operators (Pt)t 0 ≥ form a semigroup. If (Pt)t 0 is a transition function then we call the associated semigroup of operators≥ on B(E) a Markov semigroup. Proposition 2.11 (Markov processes). Let X be a stochastic process with values in E and let (Pt)t 0 be a transition function on E. Then the following conditions are equivalent:≥ (a) For all f B(E) and 0 s t, ∈ ≤ ≤ X (2.41) E[f(Xt) s ]= Pt sf(Xs), a.s. |F − (b) X has the Markov property, and for all A (E) and 0 s t, ∈B ≤ ≤ (2.42) P( Xt A Xs)= Pt s(Xs, A), a.s. { ∈ }| − (c) For all A1,...,An (E) and 0= t0 t1 tn, (2.43) ∈B ≤ ≤···≤ P X A ,...,X A { t1 ∈ 1 tn ∈ n}

= P X0 dx0 Pt1 t0 (x0, dx1) Ptn tn−1,(xn 1, dxn). { ∈ } − · · · − − Z ZA1 ZAn Proof. We claim that condition (a) is equivalent to (a)’ For all A (E) and 0 s t, ∈B ≤ ≤ X (2.44) P( Xt A s )= Pt s(Xs, A), a.s. { ∈ }|F − Indeed, the implication (a) (a)’ is obvious, while the converse follows by approximation with simple⇒ functions. Therefore the statement follows di- rectly from Proposition 2.8.

Proposition 2.12 (Construction from semigroup). Let E be a Polish space, (Pt)t 0 a transition function on E, and µ a probability measure on E. Then there≥ exists a stochastic process X, unique in finite dimensional distributions, such that X satisfies the equivalent conditions (a)–(c) from Proposi- tion 2.11. Proof. By condition (c) from Proposition 2.11, it suffices to show that there exists a stochastic process X with finite dimensional distributions given by P X A ,...,X A { t1 ∈ 1 tn ∈ n} (2.45) = µ(dx0) Pt1 t0 (x0, dx1) Ptn tn−1 (xn 1, dxn) − · · · − − Z ZA1 ZAn for all n N,0= t0 t1 tn and A1,...,An (E). By the Chap- man Kolmogorov∈ equation≤ ≤···≤ for transition functions, these∈B finite dimensional distributions are consistent in the sense of Theorem 1.2, so there exists a 32 JAN SWART AND ANITA WINTER stochastic process (Xt)t 0 with the finite dimensional distributions in (2.45). ≥

If X is a stochastic process with the Markov property and there exists a transition function (Pt)t 0 such that X satisﬁes the equivalent conditions (a)–(c) from Proposition≥ 2.11, then we say that X is time-homogeneous. Note that by (c), the ﬁnite dimensional distributions of X are uniquely determined by (X0) and (Pt)t 0. We call X the Markov process with L ≥ semigroup (Pt)t 0, started in the initial law (X0). ≥ L Example. (Transition function of Brownian motion). Very often it is not possible to give the transition function of a process explicitly. An exception is Brownian motion. Here, for all 0 s t, x R, and A (R), ≤ ≤ ∈ ∈B 1 (x y)2 (2.46) Ps,t(x, A)= dy exp − . √2πt − 2(t s) ZA −

Exercise 2.13 (Time reversal and time-homogeneity). Let T > 0 and let (Xt)t [0,T ] be a stochastic process with index set [0, T ]. How would you deﬁne ∈ the Markov property for such a process? Show that if (Xt)t [0,T ] has the ∈ Markov property then the time-reversed process (XT t)t [0,T ] also has the − ∈ Markov property. If (Xt)t [0,T ] is time-homogeneous, then is (XT t)t [0,T ] in general also time-homogeneous?∈ (Hint: it may be easier to investigate− ∈ the latter question for Markov chains (Xi)i 0,...,n .) ∈{ }

2.4. Forward and backward equations. In Proposition 2.12 we have seen that for a given initial law (X ) and transition function (Markov L 0 semigroup) (Pt)t 0, there exists a Markov process X, which is unique in finite dimensional distributions.≥ There are two reasons why we are not satisfied with this result. The first reason is that Proposition 2.12 says nothing about the sample paths of X, which we would like to be cadlag. The second reason is that Proposition 2.12 says nothing about how to construct transition functions (Pt)t 0 in the first place. Examples such as (2.46) where we can explicitly write≥ down a transition function are rare. There are basically two more general approaches towards obtaining transition functions. Identify probability kernels K on E with operators K : B(E) B(E) as in Lemma 2.6 and probability measures µ on E with functions µ→: B(E) → R. In a first attempt to obtain a transition function (Pt)t 0, we fix x E, and we consider the probability measures ≥ ∈ µ := P (x, ) (t 0). t t · ≥ Then µ f = P (x, dy)f(y)= P f(x) (t 0) t t t ≥ Z MARKOV PROCESSES 33 and therefore µ f = P f(x)= P P f(x)= µ P f (t, ε 0). t+ε t+ε t ε t ε ≥ Therefore, we can try to define an operator H, acting on probability measures, by 1 Hµ := lim ε− µPε µ , ε 0 − → and then try to solve ∂ µ = Hµ , (2.47) ∂t t t µ = δ 0 x for fixed x E. Equation (2.47) is called the forward equation. In the second∈ approach, we fix f B(E), and consider the functions ∈ u := P f (t 0). t t ≥ Then u = P f = P P f = P u (t, ε 0). t+ε t+ε ε t ε t ≥ Therefore, we can try to define an operator G, acting on functions f, by 1 Gf := lim ε− Pεf f , ε 0 − → and then try to solve ∂ u = Gu , (2.48) ∂t t t u = f 0 for fixed f B(E). Equation (2.48) is called the backward equation. ∈ 34 JAN SWART AND ANITA WINTER

3. Feller semigroups 3.1. Weak convergence. Let E be a Polish space. By definition, (3.1) (E) := f : f : E R bounded continuous Cb → is the space of all bounded continuous real-valued functions on E. We equip (E) with the supremumnorm Cb (3.2) f := sup f(x) . k k x E | | ∈ With this norm, (E) is a Banach space. If E is compact then every Cb continuous function is bounded so we simply write (E) = b(E). In this case (E) is moreover separable. By definition C C C (3.3) (E) := µ : µ probability measure on (E, (E)) . M1 { B } is the space of all probability measures on E. We equip (E) with the M1 topology of weak convergence. We say that a sequence of measures µn (E) converges weakly to a limit µ (E), denoted as µ µ, if ∈ M1 ∈ M1 n ⇒ (3.4) µnf µf f b(E). n −→→∞ ∀ ∈C (Recall the notation µf := fdµ from (2.26).) This notion of convergence indeed comes from a topology. R Proposition 3.1 (Prohorov metric). Let (E, d) be a separable metric space. r For any A E and r> 0, put A := x E : infy A d(x,y) < r . Then (3.5) ⊆ { ∈ ∈ } d (µ ,µ ) := inf r> 0 : µ (A) µ (Ar)+ r A E closed Pr 1 2 1 ≤ 2 ∀ ⊆ = inf r> 0 : µ (E E) s.t. ∃ ∈ M1 × µ(A E)= µ1(A), µ(E A)= µ2(A) A (E), µ( (x× ,x ) E E : d(x×,x ) r ) ∀r ∈B { 1 2 ∈ × 1 2 ≥ } ≤ defines a metric on (E) generating the topology of weak convergence. The M1 space ( 1(E), dPr) is separable. If (E, d) is complete, then ( 1(E), dPr) is complete.M M Proof. See [EK86, Theorems 3.1.2, 3.1.7, and 3.3.1].

The second formula for dPr in (3.5) says that d (µ ,µ ) = inf r> 0 : P d(X , X ) r r, (3.6) Pr 1 2 1 2 ({X )= µ , ≥(X}≤)= µ , L 1 1 L 1 1 where the inﬁmum is over all pairs of random variables (X1, X 2) with laws µ1 and µ2, respectively. Formula (3.4) shows that the topology of weak convergence on (E) M1 does not depend on the choice of the metric on E. In other words, if d, d˜ are equivalent metrics on E and dPr and d˜Pr are the associated Prohorov metrics on 1(E), then dPr and d˜Pr are equivalent. Proposition 3.1 moreover shows thatM (E) is Polish if E is Polish. M1 MARKOV PROCESSES 35

The next proposition can be found in [EK86, Theorem 3.2.2].

Proposition 3.2 (Prohorov). Let E be Polish. Then a set 1(E) is compact if and only if is closed and K ⊆ M K (3.7) ε> 0 K E compact s.t. µ(E K) ε µ . ∀ ∃ ⊂ \ ≤ ∀ ∈ K Property (3.7) is called the tightness if the set . Note that Proposition 3.2 implies in particular that (E) is compact ifKE is compact. M1 Exercise 3.3. Let E be Polish. Show (E) tight compact. K ⊂ M1 ⇒ K

3.2. Continuous kernels and Feller semigroups. For a proof of the following proposition, see for example [RS80, Theorem IV.14]. Proposition 3.4 (Probability measures as positive linear forms). Let E be compact and metrizable. A probability measure µ 1(E) defines through (2.26) a function µ : (E) R with the following∈ properties M C → (1) (normalization) µ1 = 1. (2) (positivity) µf 0 for all f 0. ≥ ≥ (3) (linearity) µ(λ1f1 + λ2f2)= λ1µ(f1)+ λ2µ(f2) for all λ , λ R, f ,f (E). 1 2 ∈ 1 2 ∈Cb Conversely, each function µ : (E) R with these properties corresponds through (2.26) to a probabilityC measure→ µ (E). ∈ M1 Let E, F be compact metrizable spaces and let 1(E), 1(F ) be the spaces of probability measures on E and F , respectively,M equippedM with the topology of weak convergence. By definition, a probability kernel K from E to F is continuous if the map x K(x, ) from E to (F ) is continuous. 7→ · M1 Proposition 3.5 (Continuous probability kernels). A continuous probability kernel K from E to F defines through (2.25) an operator K : (F ) (E) with the following properties C →C (1) (conservativeness) K1 = 1. (2) (positivity) Kf 0 for all f 0. ≥ ≥ (3) (linearity) K(λ1f1 + λ2f2)= λ1K(f1)+ λ2K(f2) for all λ , λ R, f ,f (E). 1 2 ∈ 1 2 ∈C Conversely, each operator K : (F ) (E) with these properties corresponds through (2.25) to a continuousC → probability C kernel K from E to F . Proof. By Proposition 3.4, the properties (1)–(4) from Proposition 3.5 are equivalent to the statement that for fixed x E, K(x, ) is a probability measure on F . (Note that tightness is automatic∈ since F·is compact.) The statement that K maps (F ) into (E) means that K(xn, dy)f(y) K(x, dy)f(y) whenever Cx x andC f (F ). This is equivalent to the→ n → ∈ C R statement that K(xn, ) K(x, ) whenever xn x, i.e., x K(x, ) is Rcontinuous. · ⇒ · → 7→ · 36 JAN SWART AND ANITA WINTER

Exercise 3.6. Show that properties (1)–(3) from Proposition 3.5 imply that K : (F ) (E) is continuous, i.e., Kf Kf whenever f f 0. C →C n → k n − k→ It is easy to see (for example from Proposition 3.5) that the composition (in the sense of (2.27)) of two continuous probability kernels is again continuous. Let E be a compact metrizable space. By deﬁnition, we say that a transition probability (Pt)t 0 on E is continuous if the map (t,x) Pt(x, ) from ≥ 7→ · [0, ) E into 1(E) is continuous. Here we equip [0, ) E with the product∞ × topologyM and (E) with the topology of weak convergence.∞ × M1 Proposition 3.7 (Feller semigroups). Let (Pt)t 0 be a continuous transition ≥ probability on E. Then the operators (Pt)t 0 deﬁned in (2.40) map (E) into (E) and, considered as operators from ≥(E) into (E), they satisfyC C C C (1) P is conservative for each t 0, i.e., P 1 = 1. t ≥ t (2) Pt is positive for each t 0, i.e., Ptf 0 for all f 0. (3) P is linear for each t ≥0. ≥ ≥ t ≥ (4) The (Pt)t 0 form a semigroup, i.e., P0f = f for all f (E) and P P ≥= P for all s,t 0. ∈C s t s+t ≥ (5) (Pt)t 0 is strongly continuous, i.e., limt 0 Ptf f = 0 for all f ≥(E). → k − k ∈C Conversely, each collection of operators (Pt)t 0 from (E) into (E) with these properties corresponds through (2.40) to≥ a continuouC s transitionC probability on E.

A collection of operators (Pt)t 0 from (E) into (E) with the properties (1)–(5) from Proposition 3.7 is≥ called a CFeller semigroupC .

Proof of Proposition 3.7. By the deﬁnition of weak convergence of probability measures, a transition probability (Pt)t 0 on E is continuous if and ≥ only if the function (t,x) Ptf(x) from [0, ) E into R is continuous for each f (E). We claim7→ that this is equivalent∞ × to the statement that P f (E)∈ for C all t 0 and t ∈C ≥ (5)’ lims t Psf Ptf = 0 for all f (E), t 0. → k − k ∈C ≥ Assume that Ptf (E) for all t 0 and (5)’ holds. Choose (tn,xn) (t,x). Then ∈ C ≥ →

Ptn f(xn) Ptf(x) Ptn f(xn) Ptf(xn) + Ptf(xn) Pt(x) (3.8) | − | ≤ | − | | − | Pt f Ptf + Ptf(xn) Pt(x) 0, n n ≤ k − k | − | −→→∞ which shows that (t,x) Ptf(x) is continuous. Conversely, if (t,x) P f(x) is continuous then7→ obviously we must have P f (E) for all t 7→0. t t ∈C ≥ Now assume that (5)’ does not hold. Then we can ﬁnd ε > 0, tn t, and x E such that → n ∈ (3.9) P f(x ) P f(x ) ε. | tn n − t n |≥ MARKOV PROCESSES 37

Since E is compact, we can choose a convergent subsequence x x. nm → Then (tmn ,xmn ) (t,x) but since (3.10) → P f(x ) P f(x) P f(x ) P f(x ) P f(x ) P f(x) | tmn mn − t | ≥ | tmn mn − t mn | − | t mn − t | we have lim infn Ptm f(xmn ) Ptf(x) ε by te continuity of Ptf, which →∞ | n − |≥ shows that Ptmn f(xmn ) Ptf(x), i.e., (t,x) Ptf(x) is not continuous. (Note that this is very similar6→ to the proof of Lemma7→ 1.7.) It follows from Proposition 3.5 that a collection (Pt)t 0 of operators on (E) satisfying (1)–(4) corresponds to a transition probabili≥ ty on E with C the property that Pt is a continuous probability kernel for each ﬁxed t 0. It therefore suﬃces to show that (5) is equivalent to (5)’. The implication≥ (5)’ (5) is trivial. Conversely, if (5) holds then ⇒ (3.11) lim Ptn f Ptf = lim Ptn t(Ptf) (Ptf) = 0 tn t k − k tn t k − − k ↓ ↓ by the semigroup property and (5) applied to Ptf. This shows that t Ptf, considered as a function from [0, ) into (E), is continuous from the7→ right. To prove also continuity from the∞ left, weC note that

(3.12) lim Ptn f Ptf = lim Ptn (f Pt tn f) lim f Pt tn f = 0, tn t k − k tn t k − − k≤ tn t k − − k ↑ ↑ ↑ where we have the semigroup property, (5), and the fact that

Ptf = sup Pt(x, dy)f(y) k k x E E (3.13) ∈ Z

sup Pt(x, dy) f(y) sup f(y) = f . ≤ x E E ≤ y E | | k k ∈ Z ∈

3.3. Banach space calculus. Let (V, ) be a Banach space, equipped with the topology generated by the norm.k · k We need to develop calculus for V -valued functions. The next proposition deﬁnes the Riemann integral for continuous V -valued functions. Since this is very similar to the usual Riemann integral, we skip the proof. Proposition 3.8 (Riemann integral). Let u : [a, b] V be continuous and let → (n) (n) (n) (n) (n) (n) (3.14) a = t0 s1 tt tmn 1 smn tmn = b ≤ ≤ ≤···≤ − ≤ ≤ satisfy (n) (n) (3.15) lim sup t t : k = 1,...,mn = 0. n k k 1 →∞ { − − } Then the limit b mn (3.16) u(t)dt := lim u(s(n))(t(n) t(n) ) n k k k 1 a →∞ − − Z Xk=1 (n) (n) exists and does not depend on the choice of the tk and sk . 38 JAN SWART AND ANITA WINTER

If a < b and u : [a, b) V is continuous then we define ≤∞ → b c (3.17) u(t)dt := lim u(t)dt, c b Za ↑ Za whenever the limit exists. In this case we say that u is integrable over [a, b). In case b< and u : [a, b] V is continuous this coincides with our earlier ∞ b → definition of a u(t)dt. Lemma 3.9R (Infinite integrals). Let a < b , let u : [a, b) V be ≤ ∞ → continuous and b u(t) dt< . Then u is integrable over [a, b) and a k k ∞ R b b (3.18) u(t)dt u(t) dt. a ≤ a k k Z Z

Proof. Since u is continuous and f f is continuous, the function t u(t) is continuous. First consider the7→ kcasek that b< and that u : [a, b] 7→ k k (n) (n) ∞ → V is continuous. Choose tk and sk as in (3.14) and (3.15). Then mn mn (n) (n) (n) (n) (n) (n) (3.19) u(sk )(tk tk 1) u(sk ) (tk tk 1). − − ≤ k k − − k=1 k=1 X X Taking the limit n we arrive at (3.18). If a < b and u : [a, b) V is continuous then→∞ it follows that for each a c c ≤∞< b → ≤ ≤ ′ c′ c c′ (3.20) u(t)dt u(t)dt u(t) dt. − ≤ k k Za Za Zc c If ∞ u(t) dt < and c b then (3.20) implies that i u(t)dt is a k k ∞ i ↑ a i 1 a Cauchy sequence, and hence, by the completeness of V , u is integrable≥ R R over [a, ). Taking the limit in (3.18) we see that this estimate holds in the more general∞ case as well.

Let I be an interval. We say that a function u : I V is continuously differentiable if for each t I the limit → ∈ ∂ 1 (3.21) u(t) := lim h− (u(t + h) u(t)) ∂t h 0 − → ∂ exists, and t ∂t u(t) is continuous on I. We skip the proof of the next result. 7→ Proposition 3.10 (Fundamental theorem of calculus). Assume that u : [a, b] V is continuously differentiable. Then → b (3.22) ∂ u(t)dt = u(b) u(a). ∂t − Za So far, when we talked about a linear operator A on a normed linear space N, we always meant a linear map A : N N that is defined on all of N. It will be convenient to generalize this definition→ such that operators need no longer be defined on the whole space. MARKOV PROCESSES 39

Definition 3.11 (Linear operators). A linear operator on a normed space (N, ) is a pair ( (A), A) where (A) N is a linear subspace of N and A : k ·( kA) N is aD linear map. TheD graph⊆ of such a linear operator is the linearD space→ (3.23) (A) := (f,Af) : f (A) N N. G { ∈ D }⊆ × We say that a linear operator is closed if its graph (A) is a closed subspace of N N, equipped with the product topology. G × Note that a linear operator (including its domain!) is uniquely charac- terized by its graph. In fact, every linear subspace N N with the property that G ⊂ × (3.24) (f, g) , (f, g˜) g =g ˜ ∈ G ∈G ⇒ is the graph of a linear operator ( (A), A).11 Note that the fact that D A is closed means that if fi (A) are such that limi fi =: f and ∈ D →∞ limi Afi =: g exist, then f (A) and Af = g. We→∞ recall a few facts from functional∈ D analysis. Theorem 3.12 (Closed graph theorem). Let (N, ) be a normed linear space and let ( (A), A) be a linear operator on Nkwith · k (A) = N. Then one has the relationsD (a) (b) (c) between the statements:D ⇔ ⇒ (a) A is continuous, i.e., Afn Af 0 whenever fn f 0. (b) A is bounded, i.e., therek exists− a constantk→ K such thatk −Afk→ K f for all f L. k k≤ k k (c) A is closed.∈ If N is complete then all statements are equivalent. To see that unbounded closed operators have nice properties, we prove the following fact, that will be useful later. Lemma 3.13 (Closed operators and integrals). Let V be a Banach space and let ( (A), A) be a closed linear operator on V . Let a < b , let u : [a, b)D V be continuous, u(t) (A) for all t [a, b), t ≤ ∞Au(t) → ∈ D ∈ 7→ continuous, b u(t) dt< , and b Au(t) dt< . Then a k k ∞ a k k ∞ R b R b b (3.25) u(t)dt (A) and A u(t)dt = A u(t)dt. ∈ D Za Za Za Proof. We first prove the statement for the case that u and Au are contin- (n) (n) uous functions on a bounded time interval [a, b]. Choose tk and sk as in (3.14) and (3.15). Define

mn (n) (n) (n) (3.26) fn := u(sk )(tk tk 1). − − Xk=1 11Sometimes the concept of a linear operator is generalized even further in the sense that condition (3.24) is dropped. In this case, one talks about multi-valued operators. 40 JAN SWART AND ANITA WINTER

Then f (A) and n ∈ D mn (n) (n) (n) (3.27) Afn = Au(sk )(tk tk 1). − − Xk=1 By our assumptions, b b (3.28) fn u(t)dt and Afn Au(t)dt. n−→ n−→ →∞ Za →∞ Za Since A is closed, it follows that (3.25) holds. The statement for intervals of the form [a, b) follows by approximation with compact intervals, again using the fact that A is closed.

3.4. Semigroups and generators. Let (V, ) be a Banach space. By deﬁnition, a (linear) semigroup on V is a collectionk · k of everywhere deﬁned linear operators (St)t 0 on V such that S0f = f for all f V and SsSt = ≥ ∈ Ss+t for all s,t 0. We say that (St)t 0 is a contraction semigroup if St is a contraction for≥ each t 0, i.e., ≥ ≥ (3.29) S f f f V. k t k ≤ k k ∀ ∈ We say that a semigroup (St)t 0 is strongly continuous if ≥ (3.30) lim Stf = f f V. t 0 ∀ ∈ → Example. A Feller semigroup on (E), where E is compact and metrizable and (E) is equipped with theC supremumnorm, is a strongly continuous contractionC semigroup. (See Proposition 3.7 and (3.13).)

Remark. If (St)t 0 is a strongly continuous contraction semigroup on a Ba- nach space V , then≥ exactly the same proof as in (3.11)–(3.12) shows that t S f is a continuous map from [0, ) into V for each f V . 7→ t ∞ ∈

Let (St)t 0 be a strongly continuous contraction semigroup on a Ba- ≥ nach space V . By deﬁnition, the generator of (St)t 0 is the linear operator ( (G), G), where ≥ D 1 (3.31) (G) := f L : lim t− (Stf f) exists , D ∈ t 0 − → and 1 (3.32) Gf := lim t− (Stf f). t 0 − → Exercise 3.14 (Generator of a deterministic process). Deﬁne a continuous transition probability (Pt)t 0 on [ 1, 1] by ≥ − (3.33) P (x, ) := δ −t ( ) (x [0, 1]). t · xe · ∈ Determine the generator ( (G), G) of the corresponding Feller semigroup D (Pt)t 0 on [ 1, 1]. ≥ C − MARKOV PROCESSES 41

Proposition 3.15 (Generators). Let V be a Banach space, let (St)t 0 be a strongly continuous contraction semigroup on V , and let ( (G), G) ≥be its generator. Then (G) is dense in V and ( (G), G) is closed.D For each f D D ∈ (G), the function t Stf from [0, ) to V is continuously diﬀerentiable, DS f (G) for all t 7→0, and ∞ t ∈ D ≥ (3.34) ∂ S f = GS f = S Gf (t 0). ∂t t t t ≥

Proof. For each h> 0 and f V we have ∈ 1 1 1 (3.35) h− (S S )f = h− (S S ) S f = S h− (S S ) f. t+h − t h − 0 t t h − 0 1 If f (G) then limh 0 h− (Sh S0)f = Gf . Since St is a contraction it ∈ D ↓ − is continuous, so the right-hand side of (3.35) converges to StGf. It follows that the other expressions converge as well, so S f (G) for all t 0 and t ∈ D ≥ 1 (3.36) lim h− (St+h St)f = GStf = StGf (t 0). h 0 − ≥ ↓ If t> 0 and 0 < h t then ≤ 1 1 (3.37) h− (Stf St h)f = St h h− (Sh S0) f StGf as h 0. − − − − → ↓ Here the convergence follows from the estimates 1 St h h− (Sh S0) f StGf k − { − } − k 1 (3.38) St h h− (Sh S0) f St hGf + St hGf StGf ≤ k − { − } − − k k − − k 1 h− (Sh S0) f Gf + St hGf StGf . ≤ k{ − } − k k − − k Formula (3.37) shows that the time derivatives of Stf from the left exist and are equal to the derivatives from the right. It follows that t Stf is continuously differentiable and (3.34) holds. 7→ To prove the other statements, we start by showing that for any f V , ∈ t t (3.39) S f ds (G) and G S f ds = S f f. s ∈ D s t − Z0 Z0 We have already seen that for any f V the function t Stf is continuous, t ∈ 7→ so 0 Ssf ds is well-defined. For each t 0, St is a contraction, hence continuous, hence closed, so by Lemma 3.13≥ (3.40)R t t 1 1 h− (Sh S0) Ssf ds = h− Ss+h Ss f ds − 0 0 − t+Zh t Z t+h h 1 1 1 = h− Ssf ds Ssf ds = h− Ssf ds h− Ssf ds. h − 0 t − 0 n Z Z o Z Z Letting h 0 we arrive at (3.39). Since for→ each f V ∈ t 1 (3.41) lim t− Ssf ds = f, t 0 ↓ Z0 42 JAN SWART AND ANITA WINTER formula (3.39) shows that (G) is dense in V . To show that ( (G), G) is D D closed, choose fn (G) such that limn fn =: f and limn Gfn =: g exist. By (3.34) and∈ D the fundamental theorem→∞ of calculus, →∞ t (3.42) S f f = S Gf ds (t> 0). t n − n s n Z0 Letting n , using the fact that Ss(Gfn fn) Gfn fn for each s 0, we find→∞ that k − k ≤ k − k ≥ t (3.43) S f f = S g ds (t> 0). t − s Z0 Dividing by t and letting t 0 we conclude that f (G) and Gf = g. → ∈ D

3.5. Dissipativity and the maximum principle. Let E be a compact metrizable space. We say that a linear operator ( (A), A) on (E) satisfies the positive maximum principle if D C (3.44) Af(x) 0 whenever f(x) 0 and f(y) f(x) y E. ≤ ≥ ≤ ∀ ∈ This says that Af(x) 0 whenever f assumes a positive maximum over E in x. ≤ Proposition 3.16 (Generators of Feller semigroups). Let E be compact and metrizable, let (Pt)t 0 be a Feller semigroup on (E), and let ( (G), G) be its generator. Then≥ C D (1) 1 (G) and G1 = 0. (2) ∈(G D) is dense in (E). (3) (D (G), G) is closed.C (4) (D(G), G) satisfies the positive maximum principle. D Proof. Property (1) follows from the fact that Pt1 = 1 for all t 0. Prop- erties (2)–(3) follow from Proposition 3.15. To prove (4), assume≥ that f (G), x E, f(x) 0, and f(y) f(x) y E. Then ∈ D ∈ ≥ ≤ ∀ ∈ (3.45) P f(x)= P (x, dy)f(y) f(x) (t 0), t t ≤ ≥ Z and therefore 1 (3.46) Gf(x) := lim t− (Ptf(x) f(x)) 0. t 0 − ≤ → (Note that the limit exists by our assumption that f (G).) ∈ D Generators of strongly continuous contraction semigroups have an important property that we have not mentioned so far. Definition 3.17. A linear operator ( (A), A) on a Banach space V is called dissipative if (λ A)f λ f forD every f (A) and λ> 0. k − k≥ k k ∈ D MARKOV PROCESSES 43

Note that an equivalent formulation of dissipativity is that (3.47) f (1 εA)f (f (A), ε> 0). k k ≤ k − k ∈ D 1 This follows by setting ε = λ− in (λ A)f λ f and multiplying both sides of the inequality by ε. k − k≥ k k Lemma 3.18 (Contractions and dissipativity). If C is an (everywhere defined) contraction and r> 0 then r(C 1) is dissipative. − Proof. If C is a contraction then for each ε > 0 and f V , one has (1 εr(C 1))f = f εrCf + εrf (1 + rε) f rε Cf∈ f . k − − k k − k≥ k k− k k ≥ k k Lemma 3.19 (Maximum principle and dissipativity). Let E be compact and metrizable and let ( (A), A) be a linear operator on (E). If A satisfies the positive maximum principle,D then A is dissipative. C Proof. Assume that f (A). Since E is compact there exists an x E with f(x) f(y) for∈ all Dy E. If f(x) 0, then by the positive maximum∈ principle| Af| ≥(x | ) 0| and therefore∈ (1 εA≥ )f f(x) εAf(x) f(x)= f . If f(x) ≤0, then by the factk that− A isk ≥linear | also− f |≥(A) and k(1k εA)f =≤ (1 εA)( f) f = f . − ∈ D k − k k − − k≥k− k k k Exercise 3.20. Let ( (A ), A ) be the operator on [0, 1] given by D WF WF C (A ) := 2[0, 1], D WF C (3.48) 2 A f(x) := 1 x(1 x) ∂ f(x), x [0, 1]. WF 2 − ∂x2 ∈ Show that AWF satisfies the positive maximum principle. For which values of c does the operator 2 (3.49) 1 x(1 x) ∂ + c( 1 x) ∂ 2 − ∂x2 2 − ∂x satisfy the positive maximum principle? Lemma 3.21 (Laplace equation and dissipativity). Let V be a Banach space, let (St)t 0 be a strongly continuous contraction semigroup on V , and let ( (G), G) be≥ its generator. Then G is dissipative. Moreover, for each λ> D0 and f V , the Laplace equation ∈ (3.50) p (G) and (λ G)p = f ∈ D − has a unique solution, which is given by

∞ λt (3.51) p = Stfe− dt. Z0 Proof. For each λ> 0, deﬁne U : V V by λ → ∞ λt (3.52) Uλf := Stfe− dt. Z0 λt 1 Since 0∞e− dt = λ− we have 1 (3.53)R Uλf λ− f (λ> 0, f V ). k k≤ k k ∈ 44 JAN SWART AND ANITA WINTER

Since (compare (3.40))

1 1 ∞ λt h− (Sh S0)Uλf = h− (St+h St)fe− dt − 0 − (3.54) Z h 1 λh ∞ λt 1 λh λt = h− (e 1) S fe− dt h− e S fe− dt, − t − t Z0 Z0 letting h 0 we ﬁnd that U f (G) and GU f = λU f f, i.e., → λ ∈ D λ λ − (3.55) (λ G)U f = f (λ> 0, f V ). − λ ∈ If f (G), then, using Lemma 3.13 and Proposition 3.15, (3.56)∈ D ∞ λt ∞ λt ∞ λt UλGf = StGfe− dt = GStfe− dt = G Stfe− dt = GUλf, Z0 Z0 Z0 so that by (3.55) we also have

(3.57) U (λ G)f = f (λ> 0, f (G)). λ − ∈ D

It follows that (λ G) is a bijection from (G) to V and that Uλ is its inverse. Now (3.53)− implies that D

1 (3.58) f = U (λ G)f λ− (λ G)f (λ> 0, f (G)), k k k λ − k≤ k − k ∈ D which shows that G is dissipative.

Lemma 3.22 (Cauchy equation). Let ( (A), A) be a dissipative linear operator on a Banach space V . Assume thatD f (A), u : [0, ) V is continuously diﬀerentiable, u(t) (A) for all ∈t D0, and that ∞u solves→ the Cauchy equation ∈ D ≥

∂ u(t)= Au(t) (t 0), (3.59) ∂t u(0) = f. ≥ Then u(t) f for all t 0. In particular, by linearity, solutions to the Cauchyk equationk ≤ k (3.59)k are≥ unique.

Proof. Since A is dissipative, by (3.47), f (1 tA)f for all f (A) and t> 0, so k k ≤ k − k ∈ D

u(t) u(0) k k − k k (1 tA)u(t) u(t) (u(t) u(0)) ≤ k − k − k − − k t (3.60) = u(t) tAu(t) u(t) Au(s)ds k − k− − 0 t Z t

tAu(t) Au(s )ds Au(t) Au(s) ds. ≤ − 0 ≤ 0 k − k Z Z

MARKOV PROCESSES 45

(n) (n) (n) (n) (n) Choose 0 = t0 t1 tmn = t such that limn sup tk tk 1 : ≤ ≤···≤ (n) →∞(n) { − − k = 1,...,mn = 0. Applying (3.60) to u(tk ) u(tk 1) we see that } k k − k − k mn (n) (n) u(t) u(0) = u(tk ) u(tk 1) k k − k k k k − k − k k=1 (3.61) (n) mn t X k (n) Au(tk ) Au(s) ds. ≤ t(n) − k=1 Z k−1 X It is not hard to check that (n) (3.62) lim sup sup Au(tk ) Au(s) = 0, n (n) (n) k − k →∞ k=1,...,mn t s

Proof. Let (St)t 0 and (S˜t)t 0 be strongly continuous contraction semigroups on V with≥ the same generator≥ ( (G), G). By Proposition 3.15, for D each f (G), the functions u(t) := Stf andu ˜(t) := S˜tf solve the Cauchy equation∈ D ∂ u(t)= Gu(t) (t 0), (3.63) ∂t u(0) = f. ≥ By Lemmas 3.21 and 3.22, S f = S˜ f for all t 0, f (G). By Proposi- t t ≥ ∈ D tion 3.15, (G) is dense in V so using the continuity of S and S˜ we ﬁnd D t t that S f = S˜ f for all t 0, f V . t t ≥ ∈ Remark. An alternative proof of Corollary 3.23 uses the fact that the Laplace equation (3.50) has a unique solution.

Corollary 3.24 (Bounded generators). Let V be a Banach space and let A : V V be a bounded dissipative linear operator. Then A generates a → strongly continuous contraction semigroup (St)t 0 on V , which is given by ≥ ∞ 1 (3.64) S f = e Atf := (At)nf (t 0). t n! ≥ Xn=0 Proof. Using the fact that A is bounded it is not hard to prove that the inﬁnite sequence in (3.64) converges, deﬁnes a strongly continuous semigroup (St)t 0 on V , and that Stf solves the Cauchy equation ≥ ∂ u(t)= Au(t) (t 0), (3.65) ∂t u(0) = f. ≥ 46 JAN SWART AND ANITA WINTER

It follows from Lemma 3.22 that (St)t 0 is a contraction semigroup. ≥ Exercise 3.25. Let E be compact and metrizable, let K be a continuous probability kernel on E, and r 0 a constant. Deﬁne an (everywhere de- ﬁned) linear operator on (E) by≥ C (3.66) Gf := r(Kf f) (f (E)). − ∈C Show that G generates a Feller semigroup. How would you describe the corresponding Markov process on E?

3.6. Hille-Yosida: different formulations. By definition, the range of a linear operator (A, (A)) on a Banach space V is the space D (3.67) (A) := Af : f (A) . R { ∈ D } Here is a version of the celebrated Hille-Yosida theorem: Theorem 3.26 (Hille-Yosida). A linear operator ( (G), G) on a Banach space V is the generator of a strongly continuous contractionD semigroup if and only if (1) (G) is dense. (2) GD is dissipative. (3) There exists a λ> 0 such that (λ G)= V . R − Note that condition (3) says that there exists a λ > 0 such that for each f V , there exists a solution p (G) to the Laplace equation (λ G)p = f. Thus,∈ the necessity of the conditions∈ D (1)–(3) follows from Proposition− 3.15 and Lemma 3.21. Before we turn to the proof of Theorem 3.26, we first discuss some of its merits, drawbacks, and consequences. The Hille-Yosida theorem is actually seldomly applied in the form in which we have stated it above. The reason is that in most cases of interest, the domain (G) of the generator of the semigroup that one is interested in is not knownD explicitly. Rather, one knows the action of G on certain well-behaved elements of the Banach space (for example sufficiently differentiable functions) and wishes to extend this action to a generator of a strongly continuous semigroup. Since generators are always closed (recall Proposition 3.15) one is naturally led to the following definition. Definition 3.27. Let ( (A), A) be a linear operator on a Banach space V D and let = (A) := (f,Af) : f (A) be its graph. Let denote the G G { ∈ D } G closure of in V V , equipped with the product topology. If is itself the graph of aG linear operator× ( (A), A) then we say that ( (A), AG) is closable D D and we call ( (A), A) the closure of ( (A), A). D D Here is the form of the Hille-Yosida theorem that it is usually applied in: MARKOV PROCESSES 47

Theorem 3.28 (Hille-Yosida, second version). A linear operator ( (A), A) on a Banach space V is closable and its closure generates a stronglyD continuous contraction semigroup if and only if (1) (A) is dense. (2) AD is dissipative. (3) There exists a λ> 0 such that (λ A) is dense in V . R − Since we are mainly interested in Feller semigroups, we will usually need the following version of Theorem 3.28: Theorem 3.29 (Hille-Yosida for Feller semigroups). Let E be compact and metrizable. A linear operator ( (A), A) on (E) is closable and its closure D C A generates a Feller semigroup if and only if

(1) There exist fn (A) such that fn 1 and Afn 0. (2) (A) is dense.∈ D → → (3) AD satisfies the positive maximum principle. (4) There exists a λ> 0 such that (λ A) is dense in (E). R − C In (1), the convergence is in (E), i.e., fn 1 0 and Afn 0. It suffices, of course, if 1 (A)C and A1 =k 0. − k → k k → We have already seen∈ how D to check the positive maximum principle in an explicit set-up. To check that a subset of (E) is dense, the next theorem is often useful. For a proof, see [RS80, TheoremC IV.9]. Theorem 3.30 (Stone-Weierstrass). Let E be compact and metrizable. As- sume that (E) separates points and D⊂C (1) 1 . ∈ D (2) f1f2 for all f1,f2 . (3) λ f ∈+ Dλ f for all∈f D,f and λ , λ R. 1 1 2 2 ∈ D 1 2 ∈ D 1 2 ∈ Then is dense in (E). D C In view of Theorem 3.30, the Conditions (1)–(3) from Theorem 3.29 are usually easy to check. The hard condition is usually condition (4), which says that there exists a dense set (E) and a λ> 0 such that for each f , there exists a solution p D⊂C(A) to the Laplace equation (λ A)p = f. It∈ actually D suffices to find solutions∈ D to a Cauchy equation. This is− not easier but perhaps a bit more intuitive: Lemma 3.31 (Cauchy and Laplace equations). Let (A, (A)) be a densely defined dissipative linear operator on a Banach space V , fD V , and assume that u : [0, ) V is continuously differentiable, u(t) ∈(A) for all t 0, and u solves∞ the→ Cauchy equation ∈ D ≥ ∂ u(t)= Au(t) (t 0), (3.68) ∂t u(0) = f. ≥ λt Then A is closable and p := 0∞ u(t)e− dt satisfies the Laplace equation (3.69) p (AR ) and (λ A)p = f. ∈ D − 48 JAN SWART AND ANITA WINTER

Proof. By Lemma 3.22, u(t) f for all t 0 so ∞ u(t)e λt dt< . k k ≤ k k ≥ 0 k − k ∞ By Lemma 3.36 below, (A, (A)) is closable. By Proposition 3.13, p (A) R and D ∈ D ∞ λt ∞ ∂ λt Ap = A u(t)e− dt = ( ∂t u(t))e− dt (3.70) Z0 Z0 ∞ λt ∞ ∂ λt = u(t)e− u(t)( ∂t e− )dt = f + λp, t=0 − 0 − Z which shows that (λ A)p = f. −

Exercise 3.32. Show that the closure of the operator AWF from Exer- cise 3.20 generates a Feller semigroup on [0, 1]. Hint: use the space of all polynomials on [0, 1]. C

3.7. Dissipative operators. Before we embark on the proofs of the various versions of the Hille-Yosida theorem we study dissipative operators in more detail. In doing so, it will be convenient to use the formalism of multi-valued operators. By definition, a multi-valued (linear) operator on a Banach space V is a linear subspace (3.71) V V. G ⊆ × We say that is single-valued if satisfies (3.24). In this case, is the graph of someG linear operator ( (AG), A) on V . We call G D ( ) := f : g V s.t. (f, g) , (3.72) D G { ∃ ∈ ∈ G} ( ) := g : f V s.t. (f, g) R G { ∃ ∈ ∈ G} the domain and range of . We say that is bounded if there exists a constant K such that G G (3.73) g K f (f, g) . k k≤ k k ∀ ∈ G We say that is a contraction if g f for all (f, g) . Note that if is single-valuedG and is the graphk k ≤ of k ( k(A), A), then these∈ G definitions coincideG with the correspondingG definitionsD for A. Lemma 3.33 (Bounded operators). Let V be a Banach space and let be a bounded (possibly multivalued) linear operator on . Then is single-valued.G G G Moreover, ( )= ( ) and is closed if and only if ( ) is closed. D G D G G D G Proof. Assume that (f, g), (f, g˜) . Then by linearity (0, g g˜) , and by boundedness g g˜ K 0 ∈= G 0, hence g =g ˜. It follows− that∈ Gis the graph of a boundedk − lineark≤ operatork k ( (A), A) on V . G One has D

(A)= f V : fn (A) s.t. fn f , (3.74) D ∈ ∃ ∈ D → (A)= f V : fn (A), g V s.t. fn f, Afn g . D ∈ ∃ ∈ D ∈ → → Therefore, the inclusion (A) (A) is obvious. Conversely, assume that f ( ), f f for someD f⊇ DV . Then Af Af K f f , n ∈ D G n → ∈ k n − mk ≤ k n − mk MARKOV PROCESSES 49 which shows that the Af form a Cauchy sequence. Therefore Af g for n n → some g V , which shows that (A) (A). ∈ D ⊆ D If ( (A), A) is closed then by what we have just proved (A)= (A)= (A),D so (A) is closed. Conversely, if (A) is closed andD f D ( ), D D D n ∈ D G fn f, Afn g, then f (A) by the fact that (A) is closed and A(→f f) →K f f ∈ D0 by the boundedness ofD ( (A), A), which k n − k ≤ k n − k → D shows that g = limn Afn = Af, and therefore (f, g) (A). This shows that ( (A), A) is closed.→∞ ∈ G D

Let V V again be a multivalued operator and let λ1, λ2 be constants. We defineG ⊆ × (3.75) λ + λ := (f, λ f + λ g) : (f, g) . 1 2G { 1 2 ∈ G} Note that if is the graph of a single-valued operator ( (A), A), then λ + λ is theG graph of ( (A), λ + λ A). We define D 1 2G D 1 2 1 (3.76) − := (g,f) : (f, g) . G { ∈ G} If is the graph of a single-valued operator ( (A), A) and A is a bijection G 1 D 1 from (A) to (A), then − is the graph of ( (A), A− ). Extending our earlierD definitionR (see (3.47)),G we say that is dissipativeR if G (3.77) f f εg (f, g) , ε> 0. k k ≤ k − k ∀ ∈ G Lemma 3.34 (Closures). Let V be a Banach space and let V V be a multivalued linear operator on V . Then G ⊆ ×

(i) λ1 + λ2 = λ1 + λ2 for all λ1, λ2 R, λ2 = 0. G 1 G ∈ 6 (ii) 1 = − . G− G (iii) If is dissipative then is dissipative. G G Proof. Since λ = 0, 2 6 λ + λ = (f, h) : (f , λ f + λ g ) (λ + λ ), 1 2G ∃ n 1 2 n ∈ 1 2G fn f, λ1f + λ2gn h (3.78) → → = (f, λ + λ g) : (f , g ) , 1 2 ∃ n n ∈ G f f, g g = λ + λ . n → n → 1 2G The proof of (ii) is similar but easier. To prove (iii), note that if (f, g) , then there exist (f , g ) such that f f and g g, and therefore,∈ byG n n ∈ G n → n → the dissipativity of , f = limn fn limn fn εgn = f εg . G k k →∞ k k≤ →∞ k − k k − k

Lemma 3.35 (Dissipativity and range). Let be dissipative and ε > 0. 1 G Then (1 ε )− is a contraction. Moreover, (1 ε )= (1 ε ) and is closed− if andG only if (1 ε ) is closed. R − G R − G G R − G Proof. If is dissipative then f f εg for all (f, g) . This means thatG h f for all (kh, fk) ≤ k(1 − ε k) 1. This shows∈ that G (1 k k ≤ k k ∈ − G − − ε ) 1 is a contraction. Therefore, by Lemmas 3.33 and 3.34, (1 ε ) = G − R − G 50 JAN SWART AND ANITA WINTER

1 1 1 ((1 ε )− ) = ((1 ε )− ) = ((1 ε )− ) = (1 ε ), and is D − G D1 − G D − G 1 R − G G closed (1 ε )− is closed ((1 ε )− ) is closed (1 ε ) is closed. ⇔ − G ⇔ D − G ⇔ R − G

Lemma 3.36 (Dissipativity and closability). Let ( (A), A) be dissipative and assume that (A) is dense in V . Then ( (A), AD) is closable. D D Proof. Let be the graph of ( (A), A). By Lemma 3.34, is dissipative, G D G while obviously ( ) is dense in V . We need to show that is single-valued. D G G By linearity, it suﬃces to show that (0, g) implies g = 0. So imagine ∈ G that (0, g) . Since ( ) is dense in V there exist (g , h ) such that ∈ G D G n n ∈ G gn g. Since is dissipative, 0+ εgn (0 + εgn) ε(g + εhn) for each ε>→0. It followsG that g kg g kεh ≤ k for each−ε> 0. Lettingk ε 0 k nk ≤ k n − − nk → and then n we ﬁnd that g = limn gn limn gn g = 0. →∞ k k →∞ k k≤ →∞ k − k

3.8. Resolvents.

Deﬁnition 3.37 (Resolvents). By deﬁnition, the resolvent set of a closed linear operator ( (A), A) on a Banach space V is the set D ρ(A) := λ R : (λ A) : (A) V is a bijection, (3.79) ∈ − 1D → (λ A)− is a bounded operator . − 1 If λ ρ(A) then the bounded operator (λ A)− : V (A) is called the resolvent∈ of A (at λ). − → D Note that λ ρ(A) implies that λ is not an eigenvalue of A. For imagine ∈ 1 that Ap = λp for some p (A). Then p = (λ A)− (λ A)p = (λ A) 10 = 0. Note furthermore∈ D that the generator− ( (G), G)− of a strongly− − D continuous contraction semigroup (St)t 0 never has eigenvalues λ> 0. For if Gf = λf with f 0 then u(t) :=≥ feλt solves the Cauchy equation ∂ ≥ λt ∂t u(t)= Gu(t) and therefore Stf = e f, contradicting contractiveness. Exercise 3.38. Show that if ( (A), A) is not closed then the set ρ(A) in (3.79) is always empty. D

Lemma 3.39 (Resolvent set is open). Let A be a closed linear operator on a Banach space V . Then the resolvent set ρ(A) is an open subset of R.

1 Proof. Assume that λ ρ(A). Then (λ A)− is a bounded operator, so ∈ 1 − there exists a K such that (λ A)− f K f for all f V . Now let λ λ < K 1. Then the inﬁnitek − sum k ≤ k k ∈ | ′ − | −

∞ n (n+1) (3.80) Sf := (λ λ′) (λ A)− f (f V ) − − ∈ Xn=0 MARKOV PROCESSES 51 deﬁnes a bounded operator S : V (A), and (3.81) → D

∞ n n+1 (λ′ A)Sf = (λ A) (λ λ′) (λ λ′) (λ A)− f − − − − − − n=0 X ∞ n n ∞ n+1 n+1 = (λ λ′) (λ A)− f (λ λ′) (λ A)− f = f − − − − − nX=0 nX=0 for each f V . In the same way we see that S(λ′ A)f = f for each ∈ − 1 f (A)so(λ′ A) : (A) V is a bijection and its inverse S = (λ′ A)− is∈ a Dbounded operator.− D → −

Exercise 3.40. Let A be a closed linear operator on a Banach space V and λ, λ′ ρ(A), λ = λ′. Prove the resolvent identity (3.82)∈ 6 1 1 1 1 (λ A)− (λ′ A)− 1 1 (λ A)− (λ′ A)− = − − − = (λ′ A)− (λ A)− . − − λ λ − − ′ − According to [EK86, page 11]: ‘Since (λ A)(λ′ A) = (λ′ A)(λ A) for 1 − 1 − 1 − − 1 all λ, λ′ ρ(A), we have (λ′ A)− (λ A)− = (λ A)− (λ′ A)− ’. Do you agree∈ with this argument?− − − − Lemma 3.41 (Resolvent set of dissipative operator). Let ρ(A) be the resolvent set of a closed dissipative operator ( (A), A) on a Banach space V and let ρ+(A) := ρ(A) (0, ). Then D ∩ ∞ (3.83) ρ+(A)= λ> 0 : (λ A)= V { R − } and either ρ+(A)= or ρ+(A) = (0, ). ∅ ∞ Proof. If A is a dissipative operator and λ > 0 then by Lemma 3.35 (λ 1 1 − A)− = λ(1 λ− A) is a bounded operator (and therefore single-valued by Lemma 3.33).− This proves (3.83). To see that ρ+(A) is either of (0, ), by Lemma 3.39, it suffices to show that ρ+(A) (0, ) is closed.∅ Choose∞ + ⊂ ∞ λn ρ (A), λn λ (0, ). We need to show that (λ A)= V . Since A is∈ closed, by Lemma→ ∈ 3.35,∞ it suffices to show that (Rλ A−) is dense in V . 1 R − Choose g V and define gn := (λ A)(λn A)− g. Then gn (λ A) and, since∈A is dissipative, − − ∈ R − 1 g g = (λ A) (λ A) (λ A)− g k − nk k n − − − n − k (3.84) 1 1 = λn λ (λn A)− g λn λ λ− g 0. n n | − |k − k ≤ | − | k k −→→∞

3.9. Hille-Yosida: proofs. Let (V, ) be a Banach space. By deﬁnition, the operator norm of an everywherek·k deﬁned bounded linear operator A : V V is → (3.85) A := inf K > 0 : Af K f f V . k k { k k≤ k k ∀ ∈ } 52 JAN SWART AND ANITA WINTER

We say that a collection of (everywhere deﬁned) bounded linear operators on V is uniformly boundedA if sup A : A < . The following fact is well-known, see for example [RS80,{k Theoremk ∈ A} III.9]∞ Proposition 3.42 (Principle of uniform boundedness). Let (V, ) be a Banach space and let be a collection of bounded linear operatorskA · k: V V . Assume that sup AAf : f V < for each f V . Then →is uniformly bounded. {k k ∈ } ∞ ∈ A

Lemma 3.43 (Order of limits). Let C,Cn be bounded linear operators on a Banach space V . Assume that limn Cnf = Cf for all f V . Then the →∞ ∈ Cn are uniformly bounded and (3.86) lim lim Cnfm = lim lim Cnfm = lim Cnfn = Cf fn f. n m m n n →∞ →∞ →∞ →∞ →∞ ∀ → Proof. By the continuity of the Cn,

(3.87) lim lim Cnfm = lim Cnf = Cf. n m n →∞ →∞ →∞ By the continuity of C,

(3.88) lim lim Cnfm = lim Cfm = Cf, m n m →∞ →∞ →∞ Since limn Cnf = Cf one has supn Cnf < for each f V , so →∞ k k k k k k ∞ ∈ by the principle of uniform boundedness the Cn are uniformly bounded. Set K := sup C . Then { } n k nk Cnfn Cf Cnfn Cnf + Cnf Cf (3.89) k − k ≤ k − k k − k K f f + C f Cf , ≤ k n − k k n − k which shows that limn Cnfn = Cf. →∞

Exercise 3.44. Let V be a Banach space, let Cn be uniformly bounded linear operators on V . Let f,f V , f f. Assume that the limit m ∈ m → limn Cnfm =: gm exists for all m. Show that the limit limm gm exists and →∞ →∞

(3.90) lim Cnf = lim gm. n m →∞ →∞ Corollary 3.45. Let V be a Banach space, let V be dense and let Cn be (everywhere deﬁned) uniformly bounded linearD operators⊂ on V . Assume that limn Cnf exists for all f . Then there exists a bounded linear operator C→∞on V such that C f ∈Cf D for all f V . n → ∈ Proof. By Exercise 3.44, the set f V : limn Cnf exists is closed, { ∈ →∞ } so by our assumptions the limit limn Cnf exists for all f V . Deﬁne →∞ ∈ Cf := limn Cnf. It is easy to see that C is linear and bounded. →∞ MARKOV PROCESSES 53

Proof of Theorem 3.26. By Proposition 3.15 and Lemma 3.21, the conditions (1)–(3) are necessary. Conversely, if (1)–(3) hold, then by Lemma 3.41, 1 (1 εG)− : V (G) is a bounded operator for each ε> 0. By deﬁnition, the−Yosida approximation→ D of G (at ε> 0) is the everywhere deﬁned bounded (by Lemma 3.35) linear operator 1 1 (3.91) G f := ε− (1 εG)− 1 f (f V ). ε − − ∈ One has

(3.92) lim Gεf = Gf (f (G)). ε 0 ∈ D → 1 To see this, recall that (1 εG)− (1 εG)f = f for all f (G) so that (1 εG) 1f f = ε(1 εG−) 1Gf for− all f (G), and therefore∈ D by (3.91) − − − − − ∈ D 1 (3.93) G f = (1 εG)− Gf (f (G)). ε − ∈ D In order to prove (3.92), by (3.93) it suﬃces to show that 1 (3.94) lim(1 εG)− f = f (f V ). ε 0 − ∈ → By (3.91), (3.93), and the fact that (1 εG) 1 is a contraction − − 1 1 (3.95) (1 εG)− f f = ε G f = ε (1 εG)− Gf ε Gf k − − k k ε k k − k≤ k k (f (G)). This proves (3.94) for f (G). By Corollary 3.45 and the fact∈ that D (G) is dense, we conclude that∈ D (3.94) holds for each f V . By LemmaD 3.35, (1 εG) 1 is a contraction, so by (3.91) and Lemma∈ 3.18, − − Gε is dissipative. Therefore, by Corollary 3.24, Gε generates a strongly ε Gεt continuous contraction semigroup (St )t 0 = (e )t 0 on V . We will show that the limit ≥ ≥ ε (3.96) Stf := lim St f ε 0 → exists for all t 0 and f V and deﬁnes a strongly continuous contraction ≥ ∈ semigroup (St)t 0 with generator G. ≥ It follows from Exercise 3.40 that GεGε′ = Gε′ Gε for all ε, ε′ > 0. Conse- G ′ t quently also Gε and e ε commute, so eGεtf eGε′ tf k −t k ∂ Gεs Gε′ (t s) = ∂s e e − fds 0 Zt ∂ Gεs Gε′ (t s) Gεs ∂ Gε′ (t s) ( ∂s e )e − f + e ( ∂s e − )f ds (3.97) ≤ 0 Z t Gεs G ′ (t s) = e e ε − (Gε Gε′ )f ds 0 − Z t

(G G ′ )f ds = t (G G ′ )f . ≤ k ε − ε k k ε − ε k Z0 Note that we have used commutativity in the last equality. It follows from Gε t (3.92) and (3.97) that for each f (G), t 0, and εn 0, (e n f)n 0 is a Cauchy sequence, and therefore∈ the D limit in≥ (3.96) exists→ for all f ≥(G) ∈ D 54 JAN SWART AND ANITA WINTER

ε and t 0. By Corollary 3.45 and the fact that the St are contractions, the limit in≥ (3.96) exists for all f V . With a bit more effort it is possible to see that the limit is locally uniform∈ in t, i.e., ε (3.98) lim sup St f Stf = 0 T > 0, f V. ε 0 0 s T k − k ∀ ∈ → ≤ ≤ It remains to show that the operators (St)t 0 defined in (3.96) form a strongly continuous contraction semigroup with≥ generator G. It is easy to see that they are contractions. For the semigroup property, we note that by Lemma 3.43 ε ε ε (3.99) StSsf = lim St Ss f = lim St+sf = St+sf (f V ). ε 0 ε 0 ∈ → → To see that (St)t 0 is strongly continuous, we note that ≥ ε ε (3.100) lim Stf f = lim lim St f f = lim lim St f f = 0, t 0 k − k t 0 ε 0 k − k ε 0 t 0 k − k → → → → → where the interchanging of limits is allowed by (3.98). In order to prove that 1 G is the generator of (St)t 0 it suffices to show that limt 0 t− (Stf f)= Gf ≥ → − for all f (G). For if this is the case, then the generator ( (G˜), G˜) of ∈ D D (St)t 0 is an extension of the operator ( (G), G). Since both (λ G) : ≥ D − (G) (λ G)= V and (λ G˜) : (G˜) V are bijections, this is only D → R − − D → possible if ( (G˜), G˜) = ( (G), G). D D 1 In order to show that limt 0 t− (Stf f)= Gf for all f (G) it suffices to show that → − ∈ D t (3.101) S f f = S Gfds (f (G)). t − s ∈ D Z0 By Proposition 3.15, t (3.102) Sεf f = SεG fds (f V ). t − s ε ∈ Z0 Using (3.98) and (a simple extension of) Lemma 3.43, ε (3.103) lim sup Ss Gεf SsGf = 0 (f (G)). ε 0 0 s t k − k ∈ D → ≤ ≤ Inserting this into (3.102) we arrive at (3.101).

Proof of Theorem 3.28. By Theorem 3.26, conditions (1) and (2) are obviously necessary. (Note that in general (A) (A) so if (A) is not dense D ⊆ D D then (A) is not dense.) By Lemma 3.35, condition (3) is also necessary. Conversely,D if ( (A), A) satisﬁes (1)–(3) then by Lemma 3.36, A is clos- D able, while by Lemma 3.35, (λ A) = V , so that by Lemma 3.41, A satisﬁes the conditions of TheoremR − 3.26.

Proof of Theorem 3.29. By Proposition 3.16 and Theorem 3.28, the conditions (1)–(4) are necessary. We have seen in Lemma 3.19 that the positive maximum principle implies that A is dissipative, so if A satisfies (1)–(4) then by Theorem 3.28 A generates a strongly continuous contraction semigroup MARKOV PROCESSES 55 on (E). If 1 (A) and A1 = 0 then u := 1 solves the Cauchy equation C ∈ D t ∂ u = Au so by Proposition 3.15 and Lemma 3.22 P 1 = 1 for all t 0. ∂t t t t ≥ To finish the proof, we must show that Ptf 0 for all f 0. This would be easy using the Cauchy equation if we would≥ know that≥A satisfies the positive maximum principle; unfortunately it is not straightforward to show the latter. Therefore we use a different approach. We know that 1 (1 εA) is dense for all ε> 0 and that (1 εA)− : (1 εA) (A) is R − − 1 R − → D a bounded operator. We claim that (1 εA)− maps nonnegative functions into nonnegative functions. Indeed, if f− (A) does not satisfy f 0, then f assumes a negative minimum over E∈in D some point x, and therefore≥ by the positive maximum principle applied to f, − (3.104) (1 εA)f(x) f(x) < 0, − ≤ which shows that not (1 εA)f 0. Thus (1 εA)f 0 implies f 0, i.e., (1 εA) 1 maps nonnegative− ≥ functions into− nonnegative≥ functions.≥ By − − approximation it follows that also (1 εA) 1 maps nonnegative functions − − into nonnegative functions.12 Let A = ε 1((1 εA) 1 1) be the Yosida ε − − − − approximation of A. Then f 0 implies ≥ n A t εt ε 1(1 εA) 1t εt ∞ ε− n (3.105) e ε f = e− e − − f = e− (1 εA)− f 0. − n! − ≥ nX=0 Letting ε 0 we conclude that P f 0. → t ≥

12Since the set P := {f ∈C(E) : f ≥ 0} is the closure of its interior and R(1 − εA) is dense in C(E), it follows that R(1 − εA) ∩P is dense in P. 56 JAN SWART AND ANITA WINTER

4. Feller processes 4.1. Markov processes. Let E be a Polish space. In Proposition 2.12 we have seen that for a given initial law (X ) and transition function (or, L 0 equivalently, Markov semigroup) (Pt)t 0 on E, there exists a Markov process X, which is unique in finite dimensional≥ distributions. We are not satisfied with this result, however, since do not know in general if X has a version with cadlag sample paths. This motivates us to change our definition of a Markov process. From now on, we work with the following definition. Definition 4.1 (Markov process). By definition, a Markov process with x x E transition function (Pt)t 0, is a collection (P ) ∈ of probability laws on ( [0, ), ( [0, )))≥such that under the law Px the stochastic process DE ∞ B DE ∞ X = (Xt)t 0 given by the coordinate projections ≥ (4.1) X (w) := ξ (w)= w(t) (w [0, ), t 0), t t ∈ DE ∞ ≥ satisfies the equivalent conditions (a)–(c) from Proposition 2.11 and one has Px X = x = 1. { 0 } x x E Sometimes we denote a Markov process by a pair (X, (P ) ∈ ), since we want to indicate which symbol we use for the coordinate projections. Note x x E that a Markov process (P ) ∈ is uniquely determined by its transition function (Pt)t 0. We do not know if to each transition function (Pt)t 0 ≥ x x E ≥ there exists a corresponding Markov process (P ) ∈ . The problem is to show cadlag sample paths. Indeed, by Proposition 2.12, there exists for x x x each x E an stochastic process X = (Xt )t 0 such that X0 = x and Xx satisfies∈ the equivalent conditions (a)–(c) from≥ Proposition 2.11. If for each x E we can find a version of Xx with cadlag sample paths, then the x∈ x laws P := (X ), considered as probability measures on E[0, ), form a Markov processL in the sense of Definition 4.1. D ∞ We postpone the proof of the next theorem till later. Theorem 4.2 (Feller processes). Let E be compact and metrizable and let (Pt)t 0 be a Feller semigroup on (E). Then there exists a Markov process x x≥ E C (P ) ∈ with transition function (Pt)t 0. ≥ Thus, each Feller semigroup (Pt)t 0 defines a unique Markov process x x E ≥ (P ) ∈ . We call this the Feller process with Feller semigroup (Pt)t 0. If x x E ≥ ( (G), G) is the generator of (Pt)t 0, then we also say that (P ) ∈ is the FellerD process with generator G. ≥ We develop some notation and terminology for general Markov processes in Polish spaces. x x E Lemma 4.3 (Measurability). Let E be Polish and let (P ) ∈ be a Markov x process with transition function (Pt)t 0. Then (x, A) P (A) is a probability kernel from E to [0, ). ≥ 7→ DE ∞ x Proof. By definition, P is a probability measure on E[0, ) for each fixed x E. Formula (4.1) shows that for fixed A ofD the form∞ A = w ∈ { ∈ MARKOV PROCESSES 57

E[0, ) : wt1 A1, . . . , wtn An with A1,...,An (E), the func- Dtion x∞ Px(A) is∈ measurable.∈ Since} ∈ B 7→ (4.2) := A ( [0, )) : x Px(A) is measurable D { ∈B DE ∞ 7→ } is a Dynkin system and since the coordinate projections generate the Borel- σ-ﬁeld on [0, ), the same is true for all A ( [0, )). DE ∞ ∈B DE ∞

x x E If (P ) ∈ is a Markov process with transition function (Pt)t 0 and µ is a probability measure on E, then using Lemma 4.3 we define a≥ probability measure Pµ on ( [0, ), ( [0, ))) by DE ∞ B DE ∞ (4.3) Pµ(A) := µ(dx) Px(A) (A ( [0, ))). ∈B DE ∞ ZE Under Pµ, the stochastic process given by the coordinate projections X = (Xt)t 0 satisfies the equivalent conditions (a)–(c) from Proposition 2.11 and x ≥ µ P X0 = µ. We call (X, P ) the Markov process with transition func- { ∈ · } x µ tion (Pt)t 0 started in the initial law µ. We let E , E denote expectation with respect≥ Px, Pµ, respectively. Recall that two stochastic processes with the same finite dimensional x x E distributions are called versions of each other. Thus, if (X, (P ) ∈ ) is a Markov process with transition function (Pt)t 0, then a stochastic process ≥ X′, defined on any probability space, which has the same finite dimensional distributions as X under the law Pµ, is called a version of the Markov process with semigroup (Pt)t 0 and initial law µ. This is equivalent to the ≥ statement that X′ satisfies the equivalent conditions (a)–(c) from Proposi- tion 2.11 and (X0′ )= µ. If X′ has moreover cadlag sample paths then this is equivalent toL (X ) = Pµ, where we view X as a random variable with L ′ ′ values in E[0, ). We are usually only interested in versions with cadlag sample paths.D ∞

4.2. Jump processes. Jump processes are the simplest Markov processes. We have already met them in Exercise 3.66. Proposition 4.4 (Jump processes). Let E be Polish, let K be a probability kernel on E, and let r 0 be a constant. Deﬁne G : B(E) B(E) by ≥ → (4.4) Gf := r(Kf f) (f B(E)), − ∈ and put

∞ 1 (4.5) P f := e Gtf := (Gt)nf (f B(E), t 0). t n! ∈ ≥ nX=0 Then (Pt)t 0 is a Markov semigroup and there exists a Markov process x x E ≥ (P ) ∈ corresponding to (Pt)t 0. If E is compact and K is a continuous x x E ≥ probability kernel then (P ) ∈ is a Feller process. 58 JAN SWART AND ANITA WINTER

Note that the inﬁnite sum in (4.5) converges uniformly since r(Kf f) 2r f for each f B(E). Before we prove Proposition 4.4k we ﬁrst− lookk ≤ at ak specialk case. ∈

Example: (Poisson process with rate r). Let E := N, K(x, y ) := 1 y=x+1 , and r> 0. Hence { } { } (4.6) Gf(x)= r f(x + 1) f(x) (f B(N)). − ∈ Then the Markov semigroup in (4.5) is given by e Gt ert(K 1) rtertK Ptf(x)= = − = e− n n rt ∞ (rt) n rt ∞ (rt) = e− K f(x)= e− f(x + n), n! n! n=0 n=0 X X hence n rt (rt) (4.7) Pt(x, x + n ) = 1 n 0 e− . { } { ≥ } n! x x N We call the associated Markov process (N, (P ) ∈ ) the Poisson process with intensity r. By condition (a) from Proposition 2.11, n x N r(t s) (r(t s)) (4.8) P ( Nt Ns = n s ) = 1 n 0 e− − − . { − }|F { ≥ } n! This says that N N is Poisson distributed with mean r(t s). Since t − s − the right-hand side of (4.8) does not depend on Ns, the random variable Nt Ns is independent of (Nu)u s. It follows that if (Nt)t 0 is a version of − ≤ ≥ the Poisson process started in any initial law, then for any 0 t1 tn, the random variables ≤ ≤···≤ N N ,...,N N t1 − 0 tn − tn−1 are independent and Poisson distributed with means r(t 0), . . . , r(t 1 − n − tn 1). Recall that if P,Q are Poisson distributed random variables with means− p and q, then P + Q is Poisson distributed with mean p + q. Poisson processes describe the statistics of rare events. For each n 1, let (n) (n) ≥ (Xi )i N be a Markov chain in N with X0 = 0 and transition probabilities ∈ P( X(n) = y X(n),...,X(n))= p(n)(X(n),y), { i+1 }| 0 i i where 1 1 if y = x, − n p(n)(x,y) := 1 if y = x + 1,  n  0 otherwise. (n) Fix r> 0 and deﬁne processes (Nt )t 0 by  ≥ (n) (n) Nt := X nrt (t 0). ⌊ ⌋ ≥ Then (n) m k m k P N = k = 1 1 1 − , where m := nrt . { t } k n − n ⌊ ⌋ MARKOV PROCESSES 59

(n) It follows that Nt converges as n to a Poisson distributed random variable with mean rt. With a bit→ more ∞ work it is easy to see that the stochastic process N (n) converges in ﬁnite dimensional distributions to a Poisson process with intensity r, started in N0 = 0. The next exercise show how to construct versions of the Poisson process with cadlag sample paths.

Exercise 4.5. Let (σk)k 1 be independent exponentially distributed random 1 ≥ n variables with mean r− and set τn := k=1 σk. Show that (4.9) N := max n 0 : τP t (t 0) t { ≥ n ≤ } ≥ deﬁnes a version N = (Nt)t 0 of the Poisson process with rate r started in ≥ N0 = 0.

Proof of Proposition 4.4. The case r = 0 is trivial so assume r> 0. For each x x x E, let (Yn )n 0 be a Markov chain started in Y0 = x with transition kernel∈ K i.e., ≥ x x x x (4.10) P( Yn A Y0 ,...,Yn 1)= K(Yn 1, A) a.s. (A (E)). { ∈ }| − − ∈B Let (σk)k 1 be independent exponentially distributed random variables with 1≥ x n mean r− , independent of Y , and set τn := k=1 σk. Define a process x x X = (Xt )t 0 by ≥ P (4.11) Xx := Y x if τ t<τ . t n n ≤ n+1 We claim that Xx satisfies the equivalent conditions (a)–(c) from Propo- sition 2.11. Since Xx obviously has cadlag sample paths this then implies x x that P := (X ) defines a Markov process with semigroup (Pt)t 0. If E is compact andL K is a continuous probability kernel then we have≥ already seen in Exercise 3.66 that (Pt)t 0 is a Feller semigroup. To see that Xx defined in≥ (4.11) satisfies condition (c) from Proposi- tion 2.11, let N = (Nt)t 0 be a Poisson process with intensity r, started in ≥ x N0 = 0, independent of Y . Then (4.11) says that (4.12) Xx := Y x (t 0), t Nt ≥ i.e., Xx jumps according to the kernel K at random times that are given by a Poisson process with intensity r. It follows that for any f B(E), ∈ k x ∞ x ∞ rt (rt) k E[f(Y )] = P N = k E[f(Y )] = e− K f(x) Nt { t } k k! Xk=0 Xk=0 rtertK etr(K 1) e tG = e− f(x)= − f(x)= f(x). The proof that Xx satisfies condition (c) from Proposition 2.11 goes basically the same. Let 0 t t and f ,...,f B(E). Then, since a ≤ 1 ≤ ··· ≤ n 1 n ∈ 60 JAN SWART AND ANITA WINTER

Poisson process has independent increments, x x E f1(YN ) fn(YN ) t1 · · · tn ∞ ∞ = P N = k P N N = k · · · { t1 1}··· { tn − tn−1 n} kX1=1 kXn=1 E f (Y x ) f (Y x ) 1 k1 n k1+ +kn · · · · ··· ∞ ∞ k1 kn rt1 (rt1) r(tn tn−1) (r(tn tn 1)) = e− e− − − − · · · k1! · · · kn! kX1=1 kXn=1 Kk1 f Kk2 f Kkn f (x) · 1 2 · · · n = e ttGf e(t2 t1)Gf e(tn tn 1)Gf (x). 1 − 2 · · · − − n

Inserting fi = 1Ai we see that condition (c) from Proposition 2.11 is satisﬁed.

Remark. Jump processes can be approximated with Markov chains. Let E be Polish, K a probability kernel on E, x E, and r > 0. For each n 1, (n) (n) ∈ ≥ let (Y )i 0 be a Markov chain with Y0 = x and transition probabilities i ≥ P( Y (n) Y (n),...,Y (n))= K(n)(Y (n), ), { i+1 ∈ · }| 0 i i · where K(n)(y, dz)= 1 K(y, dz) + (1 1 )δ (dz). n − n y (n) Then the processes (Xt )t 0 given by ≥ (n) (n) Xt := Y nrt ⌊ ⌋ converge as n in ﬁnite dimensional distributions to the jump process → ∞ X with jump kernel K, jump rate r, and initial condition X0 = x.

A well-known example of a jump process is continuous-time random walk. Example: (Random walk). Let d 1 and let p : Zd R be a probability d ≥ → d distribution on Z , i.e., p(x) 0 for all x Z and x p(x) = 1. Let E := Z (equipped with the discrete≥ topology), K∈(x, y ) := p(y x), r > 0, and { x} xPZd − deﬁne G as in (4.4). The jump process (X, (P ) ∈ ) with generator G is called the (continuous-time) random walk that jumps from x to y with rate rp(y x). − Let (Zk)k 1 be independent random variables with distribution P Zk = ≥ d { x = p(x) (x Z ) and let (σk)k 1 be independent exponentially distributed } ∈ ≥1 x random variables with mean r− , independent of (Zk)k 1. Set Yn := x + n n ≥ k=1 Zk, τn := k=1 σk, and put (4.13) Xx := Y x if τ t<τ P P t n n ≤ n+1 x x Then X = (Xt )t 0 is a version of the random walk that jumps from x to y with rate rp(y ≥x), started in Xx = x. − 0 MARKOV PROCESSES 61

Often, one wants to consider jump processes in which the jump rate r is a function of the position of the process. As long as r is a bounded function, such jump processes exist by a trivial extension of Proposition 4.4. Proposition 4.6 (Jump processes with non-constant rate). Let E be Polish, let K be a probability kernel on E, and let r B(E) be nonnegative. Deﬁne G : B(E) B(E) by ∈ → (4.14) Gf := r(Kf f) (f B(E)). − ∈ Gt Then Ptf := e f (f B(E), t 0) deﬁnes a Markov semigroup and there ∈ x x E ≥ exists a Markov process (P ) ∈ corresponding to (Pt)t 0. If E is compact x x E ≥ and K and r are continuous then (P ) ∈ is a Feller process.

Proof. Set R := supx E r(x) and deﬁne ∈ r(x) r(x) (4.15) K′(x, dy) := K(x, dy)+ 1 δ (dy). R − R x Then Gf = R(K f f) so we are back at the situation in Proposition 4.4. ′ −

Example: (Moran model). Fix n 1, put E := 0, 1,...,n , and ≥ { } (4.16) G f(x) := 1 x(n x) f(x +1)+ f(x 1) 2f(x) , X 2 − − − 1 which corresponds to setting r(x)=x(n x) and K(x, y ) := 2 1 y=x+1 + 1 − { } { } 2 1 y=x 1 for x = 1,...,n 1 . Observe that since r(0) = r(n) = 0 it is ir- { − } { − } relevant how we define K(0, ) and K(n, ). The jump process (X, (Px)x E) · · ∈ with generator GX is called the Moran model with population size n. The Moran model arises in the following way. Consider n organisms that are divided into two types, denoted by 0, 1. (For example, 0 might represent n a white flower and 1 a red one.) Let Sn := 0, 1 = y = (y(1),...,y(n)) : y(i) 0, 1 i be the set of all different ways{ in} which{ we can assign types to these∈ { n }∀organisms.} Put v (y)(i) := y(j) and v (y)(k) := y(k) if k = i. ij ij 6 Then vij(y) is the configuration in which the i-th organism has adopted y Sn the type of the j-th organism. Let (Y, P ∈ ) be the Markov process with generator (4.17) G f(y) := 1 f(v (y)) f(y)). Y 2 ij − ij X This means that each unordered pair i, j of organisms is selected with rate 1, and then one of these organisms,{ chosen} with equal probabilities, takes over the type of the other one. (Note that there is no harm in including y i = j in the sum in (4.17) since vii(y) = y.) Now if Y is a version of the y Markov process with generator GY started in Y0 = y, then n (4.18) Xx := Y y(i) (t 0) t t ≥ Xi=1 62 JAN SWART AND ANITA WINTER

n is a version of the Moran model started in x := i=1 y(i). To see this, at n least intuitively, note that if x = i=1 y(i) then x(n x) is the number of unordered pairs i, j of organisms such that i andP j−have diﬀerent types, 1 { } P and therefore 2 x(n x) is the total rate of 1’s changing to 0’s, which equals the total rate of 0’s− changing to 1’s.

4.3. Feller processes with compact state space. We will now take a look at some examples of Markov processes that are not jump processes. All processes that we will look at are processes on compact subsets of Rd with continuous sample paths (although we will not prove the latter here). One should keep in mind that there are many more possibilities for a Markov process not to be a jump process. For example, there are processes that have a combination of continuous and jump dynamics or that make infinitely many (small) jumps in each open time interval. Let d 1, let D Rd be a bounded open set and let D denote its closure. Let f ≥denote the⊂ restriction of a function f : Rd R to D. By definition: |D → (4.19) 2(D) := f : f : Rd R twice continuously differentiable . C { |D → } Let M d denote the space of real d d matrices m that are symmetric, i.e., + × mij = mji, and nonnegative definite, i.e., (4.20) v m v 0 v Rd. i ij j ≥ ∀ ∈ Xij Let a : D M d and b : D Rd be continuous functions and let ( (A), A) → + → D be the linear operator on (D) defined by C 1 ∂2 ∂ (4.21) Af(x) := aij(x) f(x)+ bi(x) f(x) (x D). 2 ∂xi∂xj ∂xi ∈ Xij Xi For x D these derivatives are defined in the obvious way, that is, ∂ f(x)= ∂xi ∈ 1 limε 0 ε− f(x + εδi) f(x) . For x in the boundary ∂D := D D we have → − \ to be a bit careful since it may happen that x + εδ D for all ε = 0. By i definition, each f 2(D) can be extended to a continuously6∈ differentiable6 ∈C function f on all of Rd. Therefore, we define (4.22) ∂ f := ( ∂ f) . ∂xi ∂xi D

To see that this deﬁnition does not depend on the choice of the extension f, note that if fˆ is another extension, then ∂ f = ∂ fˆ on D. By continuity, ∂xi ∂xi ∂ f = ∂ fˆ on D.13 ∂xi ∂xi

13Alternatively, we might have defined C2(D) as the space of all functions f : D → R whose partial derivatives up to second order exist on D and can be extended to continuous functions on D. For ‘nice’ (for example convex) domains D this definition coincides with the definition in (4.19). This is a consequence of Whitney’s extension theorem, see [EK86, Appendix 6]. MARKOV PROCESSES 63

We ask ourselves when the closure of A generates a Feller process in D, i.e., satisfies the conditions (1)–(3) from Theorem 3.28. By the Stone-Weierstrass theorem (Theorem 3.30), 2(D) is dense in (D), so condition (1) is always satisfied. C C If f 2(D) assumes its maximum in a point x D, then Af(x) 0. This∈ is C a consequence of the fact that a(x) is nonnegative∈ definite.≤ In fact, since a(x) is symmetric, it can be diagonalized. Therefore, for each x there exist orthonormal vectors e1(x),...,ed(x) Rd and constants a1(x), . . . , ak(x) such that ∈

∂2 k ∂2 k (4.23) a (x) f(x)= a (x) 2 f(x + εe (x)) . ij ∂xi∂xj ∂ε ε=0 ij k X X

Since a is nonnegative deﬁnite the constants ak(x) are all nonnegative, and 2 if f assumes its maximum in x then ∂ f(x + εek(x)) 0 for each k. ∂ε2 |ε=0 ≤ Exercise 4.7. Let D := x R2 : x < 1 be the open unit ball in R2 and put { ∈ | | } a (x) a (x) x2 x x (4.24) 11 12 := 2 1 2 a (x) a (x) x x −x2 21 22 − 1 2 1 and

(4.25) (b1(x), b2(x)) := c (x1,x2). For which values of c does the operator A in (4.21) satisfy the positive maximum principle? The preceding exercise shows that it is not always easy to see when A satisfies the positive maximum principle also for x ∂D. If this is the case, however, and by some means one can also check∈ Condition (4) from x x D Theorem 3.29, then A generates a Feller process (X, (P ) ∈ ) in D. We will later see that under Px, X has a.s. continuous sample paths. We call x x D (X, (P ) ∈ ) the diffusion with drift b and local diffusion rate (or diffusion matrix) a. The next lemma explains the meaning of the functions a and b. Lemma 4.8 (Drift and diffusion rate). Assume that the closure of the operator A in (4.21) generates a Feller semigroup (Pt)t 0. Then, as t 0, ≥ → (i) P (x, dy)(y x ) = b (x)t + o(t), t i − i i (4.26) ZD (ii) P (x, dy)(y x )(y x )= a (x)t + o(t), t i − i j − j ij ZD for all i, j = 1,...,d. Proof. For any f 2(D) we have by (3.32) ∈C (4.27) P (x, dy)f(y)= P f(x)= f(x)+ tAf(x)+ o(t) as t 0. t t → ZD 64 JAN SWART AND ANITA WINTER

Fix x D and set fi(y) := (yi xi). Then f(x)=0and Afi(x)= bi(x), and therefore∈ (4.27) yields (4.26) (i).− Likewise, inserting f (y) := (y x )(y ij i − i j − xj) into (4.27) yields (4.26) (ii).

By condition (a) from Proposition 2.11, Lemma 4.8 says that if X is a version of the Markov process with generator A, started in any initial law, then (4.28) (i) E[(X (i) X (i)) X ] = b (X )ε + o(ε), t+ε − t |Ft i t (ii) E[(X (i) X (i))(X (j) X (j)) X ]= a (X )ε + o(ε). t+ε − t t+ε − t |Ft ij t Therefore, the functions b and a describe the mean and the covariance matrix of small increments of the diffusion process X.14 If a 0 then ≡ (4.29) Px X = x t 0 = 1, { t t ∀ ≥ } where t x solves the differential equation 7→ t ∂ (4.30) ∂t xt = b(xt) with x0 = x. In general, diffusions can be obtained as solutions to stochastic differential equations of the form

(4.31) dXt = σ(Xt)dBt + b(Xt)dt where σ(x) is a matrix such that k σik(x)σjk(x) = aij(x) and B is d- dimensional Brownian motion, but this falls outside the scope of this section. P Example: (Wright-Fisher diffusion). By Exercise 3.32, the operator AWF y y [0,1] from Exercise 3.20 generates a diffusion process (Y, (P ) ∈ ) in [0, 1]. This diffusion is known as the Wright-Fisher diffusion. For each n 1, let Xn be a Moran model in 0,...,n (see (4.16)) with some initial law≥ (Xn). Define { } L 0 n 1 n (4.32) Yt := n Xt , n and assume that (Y0 ) µ for some probability measure µ on [0, 1]. We claim that (Y n) L (Y ⇒) where Y is a version of the Wright-Fisher diffusion L ⇒L with initial law (Y0)= µ. The proof of this fact relies on some deep results that we have notL seen yet, so we will only give a heuristic argument. Let n n (Pt )t 0 be the transition function of Y . Then by (4.16), ≥ n 2 1 1 (4.33) Pt (y, )= δy + tn y(1 y) 2 δy+ 1 + 2 δy 1 δy + o(t), · − n − n − and therefore (i) P n(y, dz)(z y) = o(t), t − (4.34) ZD (ii) P n(y, dz)(z y)2 = ty(1 y)+ o(t). t − − ZD 14 Indeed, since the mean of Pt(x, · ) is x plus a term of order t, the covariance matrix 2 of Pt(x, · ) is equal to RD Pt(x, dy)(yi − xi)(yj − xj ) up to an error term of order t . MARKOV PROCESSES 65

Thus, at least the ﬁrst and second moments of small increments of the process Y n converge to those of Y .

The next example shows that the domain of an operator is not only of technical interest, but can significantly contribute to the behavior of the corresponding Markov process. Example: (Brownian motion with absorption and reflection). Define linear operators ( (A ), A ) and ( (A ), A ) on [0, 1] by D ab ab D re re C 2 (Aab) := f [0, 1] : f ′′(0) = 0 = f ′′(1) , (4.35) D {1 ∈C } A f(x) := f ′′(x) (x [0, 1]), ab 2 ∈ and 2 (Are) := f [0, 1] : f ′(0) = 0 = f ′(1) , (4.36) D {1 ∈C } A f(x) := f ′′(x) (x [0, 1]). re 2 ∈ Then the closures of Aab and Aab generate a Feller processes in [0, 1]. The operator Aab generates Brownian motion absorbed at the boundary and Are generates Brownian motion reflected at the boundary. To see that Aab and Are satisfy the positive maximum principle, note that if f (Aab) or f (Are) assumes its maximum in a point x (0, 1) then 1 ∈ D ∈ D ∈ 2 f ′′(0) 0. If f (Aab) assumes its maximum in a point x 0, 1 then ≤ 1 ∈ D ∈ { } Aabf(x)= 2 f ′′(x) = 0 by the definition of (Aab)! Similarly, if f (Are) D 1 ∈ D assumes its maximum in a point x 0, 1 then Aref(x) = 2 f ′′(x) 0 because of the fact that f (x) = 0 by∈ the { definition} of (A ). ≤ ′ D re The fact that Aab and Are satisfy Condition (4) from Theorem 3.29 follows from the theory of partial differential equations, see for example [Fri64].

4.4. Feller processes with locally compact state space. So far, we have only been able to treat Feller processes with compact state spaces. We will now show how to deal with processes with locally compact state spaces. We start with an example. Example: (Brownian motion). Fix d 1 and define ≥ 1 2 1 e 2t y x (4.37) Pt(x, dy) := (2πt)d/2 − | − | dy. d Then (Pt)t 0 is a transition function on R and there exists a Markov process x x R≥d (B, (P ) ∈ ) with continuous sample paths associated with (Pt)t 0. ≥ Let Rd := Rd be the one-point compactification of Rd (compare ∪ {∞} x x Rd with (1.66)) and define a Markov process (P ) ∈ by

x Px if x Rd, (4.38) P := δ if x ∈= , ∞ ∞ where δ denotes the delta-measure on the constant function w(t) := for ∞ x ∞ all t 0. Note that P is a measure on [0, ) while Px is a measure on ≥ DRd ∞ 66 JAN SWART AND ANITA WINTER

x x d x Rd [0, ), so when we say that P = P for x R we mean that P is the D ∞ x ∈ image of P under the embedding map d [0, ) [0, ). DR ∞ ⊂ DRd ∞ x x Rd d We claim that (P ) ∈ is a Feller process with compact state space R . It is not hard to see that this is a Markov process with transition function d Pt(x, ) if x R , (4.39) P t(x, ) := · ∈ · δ if x = . ∞ ∞ We must show that this transition function is continuous. This means that we must show that (t,x) P f(x) is continuous for each f (Rd). Since 7→ t ∈C P 1(x) = 1, by subtracting a constant it suﬃces to show that (t,x) t 7→ P f(x) is continuous for each f (Rd) := f (Rd) : f( ) := 0 . t ∈ C0 { ∈ C ∞ } Assume that (t ,x ) (t,x) [0, ) Rd. Without loss of generality we n n → ∈ ∞ × may assume that xn = for all n. We distinguish two cases. 1. If x = , then by uniform convergence6 ∞ 6 ∞ 1 2 1 y xn P f(x )= e 2tn f(y)dy tn n (2πt )d/2 − | − | Rd n (4.40) Z 1 2 1 e 2t y x f(y)dy = P f(x). (2πt)d/2 − | − | t n−→ d →∞ ZR 2. If x = , then for each compact set C Rd ∞ ⊂ 1 y x 2 1 e 2tn n (4.41) Ptn f(xn) f d/2 sup + sup f . (2πtn) − | − | | | ≤ k k y C x Rd C k k ∈ ∈ \

Since for each ε> 0 we can ﬁnd a compact set C such that supx Rd C f ∈ \ k k≤ ε, taking the limit n in (4.41) we ﬁnd that lim supn Ptn f(xn) ε for each ε> 0, and therefore,→∞ by (4.39) →∞ | |≤

(4.42) lim Pt f(xn)=0= P tf( ). n n →∞ ∞

We can use the compactification trick from the previous example more generally. We start with a simple observation. Let E be locally compact but not compact, separable, and metrizable, and let E := E be its ∪ {∞} one-point compactification. Let 0(E) := f b(E) : limx f(x) = 0 denote the separable BanachC space of continuous{ ∈ C real functi→∞ons on E vanishing} at infinity, equipped with the supremumnorm.

x x E Lemma 4.9 (Compactification of Markov process). Assume that (P ) ∈ is a Markov process in E with transition function (P t)t 0, and that x ≥ (1) (non-explosion) P Xt, Xt = t 0 = 1 x = . { − 6 ∞ ∀ ≥ } ∀ 6 ∞ x x Let P and Pt(x, ) denote the restrictions of P and P t(x, ) to E[0, ) · x x E · D ∞ and E, respectively. Then (P ) ∈ is a Markov process in E with transition function (Pt)t 0. If moreover ≥ (2) (non-implosion) P∞ X = t 0 = 1, { t ∞ ∀ ≥ } MARKOV PROCESSES 67 then for each t 0, Pt maps 0(E) into itself and (Pt)t 0 is a strongly continuous contraction≥ semigroupC on (E). ≥ C0 x x E Proof. The fact that (P ) ∈ is a Markov process in E with transition function (Pt)t 0 if the Feller process in E is non-explosive is almost trivial. We must only≥ show that the event in condition (1) is actually measurable. Since X = (Xt)t 0 can be viewed as a random variable with values in E[0, ), it suffices to≥ show that [0, ) is a measurable subset of [0,D ). This∞ DE ∞ DE ∞ follows from the fact that E[0, ) is Polish in the induced topology, so that by Proposition 1.24, D[0, ∞) is a countable intersection of open sets DE ∞ in E[0, ). ObserveD ∞ that there is a natural identification between the space (E) C0 and the closed subspace of (E) given by (E) := f (E) : f( ) = 0 . C C0 { ∈C ∞ } If the Feller process in E is not only non-explosive but also non-implosive, then (4.43) P f( )= f( ) = 0 t ∞ ∞ for each f (E) := f (E) : f( ) = 0 , which shows that P ∈ C0 { ∈ C ∞ } t maps 0(E) into itself. Since (P t)t 0 is a strongly continuous contraction C ≥ semigroup on (E) its restriction to the closed subspace 0(E) is also a strongly continuousC contraction semigroup. C

The next Proposition gives sufficient conditions for non-explosion and non-implosion in terms of the generator of a process. The function f in condition (1) is an example of a Lyapunov function. Proposition 4.10 (Non-explosion and non-implosion). Let E be the one- point compactification of a locally compact separable metrizable space E and x let (P )x E be a Feller process in E with generator ( (G), G). If ∈ D (1) There exist functions f, g : E R such that f 0, limx f(x)= → ≥ →∞ , and supx E g(x) < , and functions fn (G) such that 0 ∞ ∈ ∞ ∈ D ≤ fn(x) f(x) for all x E, fn( ) , and Gfn g uniformly on compacta↑ in E. ∈ ∞ ↑∞ → x x E then (P ) ∈ is non-explosive. If (2) Gf( ) = 0 for all f (G), ∞ ∈ D x x E then (P ) ∈ is non-implosive. x x E Proof. The proof that (1) implies that (P ) ∈ is non-explosive will be post- poned to Section 5.6. ∂ If (2) holds then any solution to the Cauchy equation ∂t u(t) = Gu(t) ∂ satisfies ∂t u(t)( ) = 0. Therefore, by Proposition 3.15, P tf( ) = f( ) ∞ x∞x E ∞ for all f (G) and t 0, where (P t)t 0 is the semigroup of (P ) ∈ . Since ∈ D ≥ ≥ (G) is dense it follows that P f( )= f( ) for all f (E), t 0. This D t ∞ ∞ ∈C ≥ means that P t( , )= δ for each t 0. It follows that P∞ Xt = = 1 ∞ · ∞ ≥ { ∞} for all t 0, which implies that P∞ X = t Q [0, ) = 1, and ≥ { t ∞ ∀ ∈ ∩ ∞ } 68 JAN SWART AND ANITA WINTER

x x E therefore (P ) ∈ is non-implosive by the right-continuity of sample paths.

Example: (Feller diffusion). Identify [0, ] with the space f [0, ) : C ∞ { ∈ C ∞ limx f(x) exists and define an operator ( (AFel), AFel) on [0, ] by (4.44)→∞ } D C ∞ 2 ∂2 (AFel) := f [0, ) : lim f(x) exists and lim x 2 f(x) = 0 , D { ∈C ∞ x x ∂x } 2 →∞ →∞ A f(x) := x ∂ f(x) (x [0, )). Fel ∂x2 ∈ ∞ We claim that the closure of AFel generates the semigroup of a non-explosive and non-implosive Feller process in [0, ]. It is not hard to check that ∞ AFel satisfies the positive maximum principle. Consider the class of Laplace functions fλ λ 0 defined by { } ≥ (4.45) f (x) := e λx (x [0, )). λ − ∈ ∞ We calculate ∂2 2 e λx (4.46) x 2 fλ(x)= λ x 0, ∂x − x −→→∞ which shows that f (A ) for all λ 0. By the Stone-Weierstrass λ ∈ D Fel ≥ theorem (Theorem 3.30), the linear span of fλ λ 0 is dense in [0, ]. We claim that for each λ 0 there exists a solution{ } u≥to the CauchyC equation∞ ≥ ∂ u(t)= A u(t) (t 0), (4.47) ∂t Fel u(0) = f . ≥ λ Indeed, it is easy to see that

2 (4.48) ∂ e x/t = x ∂ e x/t (t> 0, x [0, )) ∂t − ∂x2 − ∈ ∞ so the solution to (3.59) is given by 1 1 (4.49) u(t)= f where λ := (λ− + t)− (t 0) λt t ≥ if λ> 0 and λ := 0 (t 0) if λ = 0. It therefore follows from Theorem 3.29 t ≥ and Lemma 3.31 that AFel generates a Feller semigroup on [0, ]. x x [0, ] C ∞ Let (P ) ∈ ∞ denote the corresponding Feller process. It is easy to see that the function f(x) = x (with g(x) = 0) satisfies condition (1) from x x [0, ] Proposition 4.10, so (P ) ∈ ∞ is non-explosive. Since limx AFelf(x) = 0 x →∞ for all f (A ) it follows that (P )x [0, ] is also non-implosive. ∈ D Fel ∈ ∞ Let P (x, ) denote the restriction of P (x, ) to [0, ). By what we t · t · ∞ have just proved, (Pt)t 0 is the transition function of a Markov process ≥ (X, (Px)x [0, )) on [0, ), which is called the Feller diffusion. ∈ ∞ ∞ Formula (4.49) tells us that the semigroup (Pt)t 0 maps Laplace functions into itself. Indeed, ≥ (4.50) P f = f (t, λ 0) t λ λt ≥ with λt as in (4.49). This is closely related to the branching property of the Feller diffusion. By this, we mean that if Xx and Xy are independent MARKOV PROCESSES 69

x y versions of the Feller diﬀusion started in X0 = x and X0 = y, respectively, x+y x+y and X is a version of the Feller diﬀusion started in X0 := x + y, then (4.51) (Xx + Xy)= (Xx+y) (t 0). L t t L t ≥ To see this, note that (4.50) says that (4.52) Ex e λXt = e λtx (t, λ 0). − − ≥ By independence e λ(Xx + Xy) e λXx e λXy E − t t = E − t E − t (4.53) x+y e λtxe λty e λt(x +y) e λX = − − = − = E − t . Since this holds for all λ 0 and the linear span of the Laplace functionals is dense, (4.51) follows. ≥

x x [0, ) Exercise 4.11. Let (X, (P ) ∈ ∞ ) be the Feller diﬀusion. Calculate the extinction probability Px[X = 0] for each t,x 0. t ≥ 70 JAN SWART AND ANITA WINTER

5. Harmonic functions and martingales x x E 5.1. Harmonic functions. Let (P ) ∈ be a Feller process with a compact metrizable state space E, Feller semigroup (Pt)t 0 and generator ( (G), G). ≥ D Lemma 5.1 (Harmonic functions). The following conditions on a function h (E) are equivalent: ∈C (a) Pth = h t 0. (b) h (G) and∀ ≥ Gh = 0. ∈ D 1 Proof. If Pth = h for all t 0 then limt 0 t− (Pth h) = 0, so h (G) ≥ → − ∈ D and Gh = 0. Conversely, if h (G) and Gh = 0, then the function ut := h ∈ D ∂ (t 0) solves the Cauchy equation ∂t ut = Gut with initial condition u0 = h, so≥ by Propositions 3.15 and 3.22 it follows that P h = u = h for all t 0. t t ≥ A function h (E) satisfying the equivalent conditions (a) and (b) from ∈ C x x E Lemma 5.1 is called a harmonic function for the Feller process (P ) ∈ .

Example 5.2 (Harmonic function for Wright-Fisher diffusion). Let AWF be y y [0,1] as in Exercises 3.20 and 3.32 and let (Y, (P ) ∈ ) be the Wright-Fisher diffusion, i.e., the Feller process with generator AWF. Then the function h : [0, 1] [0, 1] given by h(x) := x is harmonic for X. As a consequence, the Wright-Fisher→ diffusion satisfies (5.1) Ex[X ]= x (t 0). t ≥ 2 Proof. Since 1 x(1 x) ∂ x = 0, h satisfies condition (b) from Lemma 5.1. 2 − ∂x2 As a consequence, Ex[X ]= P h(x)= h(x)= x for all t 0. t t ≥

x x E Let (P ) ∈ ) is a Feller process with a compact metrizable state space E x x E and let h (E) be harmonic. Then, if X is a version of (P ) ∈ started in an arbitrary∈ C initial law, by condition (a) from Proposition 2.11, (5.2) E[h(X ) X ]= P h(X )= h(X ) a.s. (0 s t). t |Fs t s s ≤ ≤ This motivates the following definitions. By definition, a filtration is a family ( t)t 0 of σ-fields such that s t for all 0 s t. An t-martingale isF a≥ stochastic process M suchF ⊂ that F M is -measurable,≤ ≤ EF[ M ] < , t Ft | t| ∞ and E[Mt s] = Ms for all 0 s t. In the next sections we will study filtrations|F and martingales in more≤ ≤ detail.

5.2. Filtrations. By definition, a filtered probability space is a quadruple (Ω, , ( t)t 0, P) such that (Ω, , P) is a probability space and ( t)t 0 is F F ≥ F F ≥ a filtration on Ω with t t 0. For example, if X is a stochastic X F ⊂F ∀ ≥ process, then ( t )t 0, defined in (2.5), is the filtration generated by X. F ≥ We say that a stochastic process X is adapted to a filtration ( t)t 0, or simply -adapted, if X is -measurable for each t 0. F ≥ Ft t Ft ≥ MARKOV PROCESSES 71

Deﬁnition 5.3 (Progressive processes). A stochastic process X on (Ω, , P) F is said to be progressively measurable with respect to ( t)t 0, or simply t- F ≥ F progressive, if the map (s,ω) Xs(ω) from [0,t] Ω into E is [0,t] t- measurable for each t 0. 7→ × B × F ≥

Exercise 5.4. Let X be a stochastic process and ( t)t 0 a filtration. Assume F ≥ that X is t-adapted and that X has right continuous sample paths. Show that X is F -progressive. (Hint: adapt the proof of Lemma 1.4.) Ft If ( t)t 0 is a filtration, then F ≥ (5.3) := (t 0) Ft+ Fs ≥ s>t \ defines a new, larger filtration ( t+)t 0. If t+ = t t 0 then we say that F ≥ F F ∀ ≥ the filtration ( t)t 0 is right continuous. It is not hard to see that ( t+)t 0 is right continuous.F ≥ F ≥ Recall that the completion of a σ-field with respect to a probability measure P is the σ-field F (5.4) := A Ω : B s.t. 1 = 1 a.s. . F { ⊂ ∃ ∈ F A B } There is a unique extension of the probability measure P to a probability measure on . If (Ω, , ( t)t 0, P) is a filtered probability space then F F F ≥ (5.5) := A Ω : B s.t. 1 = 1 a.s. (t 0) F t { ⊂ ∃ ∈ Ft A B } ≥ defines a new filtration ( t)t 0. If t = t t 0 then we say that the F 15≥ F F ∀ ≥ filtration ( t)t 0 is complete. A random variable X with values in a Polish F ≥ space is t-measurable if and only if there exists an t-measurable random variable FY such that X = Y a.s. F

Lemma 5.5 (Usual conditions). If ( t)t 0 is a filtration, then F ≥ (5.6) := = A Ω : B s.t. 1 = 1 a.s. (t 0) F t+ F s { ⊂ ∃ ∈ Ft+ A B } ≥ s>t \ defines a complete, right-continuous filtration.

Proof. It is easy to see that s>t s is right continuous and that A Ω : B s.t. 1 = 1 a.s. isF complete. To see that the two{ formulas⊂ ∃ ∈ Ft+ A B } for in (5.6) are equivalent,T observe that A n B F t+ ∈ s>t F s ⇒ ∀ ∃ n ∈ t+ 1 s.t. 1A = 1Bn a.s. Put 1B∞ := lim infm 1Bm . Then 1A = 1B∞ a.s. F n T and since 1B∞ = lim infm n 1Bm we have B t+ 1 n B t+. ≥ ∞ ∈ F n ∀ ⇒ ∞ ∈ F This shows that A A Ω : B s.t. 1 = 1 a.s. . Conversely, ∈ { ⊂ ∃ ∈ Ft+ A B } if B t+ s.t. 1A = 1B a.s., then obviously A s for all s>t so A ∃ ∈ F . ∈ F ∈ s>t F s 15 T Warning: F t is not the same as the completion of the σ-ﬁeld Ft with respect to the restriction of P to Ft. The reason is the class of null sets of the restriction of P to Ft is smaller than the class of null sets of P. Because of this fact, some authors prefer to call (F t)t≥0 the augmentation, rather than the completion, of (Ft)t≥0. 72 JAN SWART AND ANITA WINTER

A filtration that is complete and right-continuous is said to fulfill the usual conditions. 5.3. Martingales. Definition 5.6 (Martingale). An -submartingale is a real-valued stochas- Ft tic process M, adapted to a filtration ( t)t 0, such that E[ Mt ] < t 0 and F ≥ | | ∞ ∀ ≥ (5.7) E[M ] M a.s. (0 s t). t|Fs ≥ s ≤ ≤ A stochastic process M is called an -supermartingale if M is an - Ft − Ft submartingale, and an t-martingale if M is an t-submartingale and an -supermartingale. F F Ft We can think of an -martingale as a model for a fair game of chance, Ft where Mt is the capital that a player holds at time t 0 and t is the information available to the player at that time. Then (5.7)≥ says thatF if the player holds a capital Ms at time s, then the expected capital that the player will hold at a later time t, given the information at time s, is precisely Ms.

Lemma 5.7 (Martingale ﬁltration). Let ( t)t 0 and ( t)t 0 be ﬁltrations F ≥ G ≥ such that t t for all t 0. Then every t-submartingale that is t- adapted isF an⊂ G-submartingale.≥ G F Ft Proof. Since t t, since M is a t-submartingale, and since M is t- adapted: F ⊂ G G F E[M ]= E E[M ] E[M ]= M a.s. (0 s t). t|Fs t|Gs Fs ≥ s|Fs s ≤ ≤

M In particular, if follows that every t-submartingale is also an t -sub- F M MF martingale. If a stochastic process M is an t -submartingale, t -super- M F F martingale, or t -martingale, then we simply say that M is a submartingale, supermartingale,F or martingale, respectively. Note that if (Ω, , ( t)t 0, P) is a filtered probability space and M is a real random variableF suchF ≥ that E[ M ] < , then ∞ | ∞| ∞ (5.8) Mt := E[M t] (t 0) ∞|F ≥ defines an t-martingale. This follows from the facts that E E[M t] F ∞|F ≤ E E[ M t] = E[ M ] < and E E[M t] s = E[M s] for all ∞ ∞ ∞ ∞ 0 s| t||F. Formula| (5.8)| defines∞ the stochastic|F processF M uniquely|F up to ≤ ≤ modifications, since for each fixed t the conditional expectation is unique up to a.s. equality. These observations raise a number of questions. Do all t-martingales have a last element M as in (5.8)? Can we find modificationsF of M with cadlag sample paths? Before∞ we address these questions we first pose another one: We know that the conditional expectation E[X ] of a random variable X with respect to a σ-field is continuous in X. For|F example, if X X F n → MARKOV PROCESSES 73

p p in L -norm for some 0 p< , then E[Xn ] E[X ] in L -norm. But how about the continuity≤ of E∞[X ] in the σ|F-field→ ? |F |F F Let ( n)n N be a sequence of σ-fields. We say that the σ-fields n decrease F ∈ F to a limit , denoted as n , if 0 1 and F∞ F ↓ F∞ F ⊃ F ⊃··· := n. F∞ F Likewise, we say that the σ-fields n\increase to a limit , denoted as F F∞ n , if 0 1 and F ↑ F∞ F ⊂ F ⊂···

:= σ n . F∞ F One has the following theorem. (See[ [Chu74, Theorem 9.4.8], or [Bil86, Theorems 35.5 and 35.7].) Theorem 5.8 (Continuity of conditional expectation in the σ-field). Let X be a random variable defined on a probability space (Ω, , P) and let F ( n)n N be a sequence of sub-σ-fields of . Assume that E[ X ] < and F ∈ F | | ∞ that n or n . Then F ↓ F∞ F ↑ F∞ 1 E[X n] E[X ] a.s. and in L -norm. n ∞ |F −→→∞ |F Corollary 5.9 (Filtration enlargement). Let ( t)t 0 be a filtration. Then F ≥ every t-submartingale with right continuous sample paths is also an t+- submartingale.F F

Proof. By Theorem 5.8 and the right continuity of sample paths, we have E[Mt s+]= E[Mt s+] = limn E[Mt s+ 1 ] = limn Ms+ 1 = Ms a.s. |F |F →∞ |F n →∞ n

Coming back to our earlier questions about martingales, here are two answers.

Theorem 5.10 (Modification with cadlag sample paths). Let ( t)t 0 be a F ≥ filtration and let M be an t+-submartingale. Assume that t E[Mt] is right continuous. Then M hasF a modification with cadlag sample7→ paths. This result can be found in [KS91, Theorem 1.3.13]. Note that if M is a martingale, then E[M ] does not depend on t so that in this case t t 7→ E[Mt] is trivially right continuous. The next result can be found in [KS91, Theorem 1.3.15]. Theorem 5.11 (Submartingale convergence). Let M be a submartingale with right continuous sample paths, and assume that supt 0 E[Mt 0] < . Then there exists a random variable M such that E[ ≥M ] < ∨ and ∞ ∞ | ∞| ∞ Mt M a.s. t ∞ −→→∞ 74 JAN SWART AND ANITA WINTER

5.4. Stopping times. There is one more result about martingales that is of central importance. Think of a martingale as a fair game of chance. Then formula (5.7) says that the expected gain of a player who stops playing at a fixed time t is zero. But how about players who stop playing at a random time? It turns out that the answer depends on what we mean by a random time. If the information available to the player at time t is t, then the decision whether to stop playing should be made on the basisF of this information only. This leads to the definition of stopping times. Let ( t)t 0 be a filtration. By definition, an t-stopping time is a function F ≥ F τ :Ω [0, ] such that the stochastic process (1 τ t )t 0 is t-adapted. → ∞ { ≤ } ≥ F Obviously, this is equivalent to the statement that the event τ t (i.e., the set ω : τ(ω) t ) is -measurable for each t 0. We interpret{ ≤ }τ as a { ≤ } Ft ≥ random time with the property that, if t is the information that is available to us at time t, then we can at any timeFt decide whether the stopping time τ has already occurred.

Lemma 5.12 (Optional times). Let ( t)t 0 be a filtration on Ω and let F ≥ τ :Ω [0, ] be a function. Then τ is an t+-stopping time if and only if τ s Ft ∀ ≥ τ s t t>s 0. Therefore, for each t 0 we can choose sn t to see{ ≤ that} ∈τ F s 0 we can choose t > un s to see that ∀τ ≥ s = τ < uS , hence≥ τ s =: ↓ s 0. { ≤ } n{ n} ∈ Ft { ≤ }∈ t>s Ft Fs+ ∀ ≥ -stoppingT times are also called optional timesT . Ft+ Lemma 5.13 (Stopped process). Let ( t)t 0 be a filtration, let τ be an -stopping time, and let X be an -progressiveF ≥ stochastic process. Then Ft+ Ft (Xt τ )t 0 is t-progressive. If τ < then Xτ is measurable. ∧ ≥ F ∞ Proof. The fact that X is progressive means that for each t 0, the map (s,ω) X (ω) from [0,t] Ω to E is [0,t] -measurable.≥ We need to 7→ s × B × Ft show that (s,ω) Xs τ(ω)(ω) is [0,t] t-measurable. It suffices to show 7→ ∧ B × F that (s,ω) s τ(ω) is measurable with respect to [0,t] and [0,t]. 7→ ∧ B × Ft B Then (s,ω) (s τ(ω),ω) Xs τ(ω)(ω) from [0,t] Ω [0,t] Ω E is 7→ ∧ 7→ ∧ × → × → measurable with respect to [0,t] t, [0,t] t, and (E). Now, for any 0

Lemma 5.14 (Operations with stopping times). Let ( t)t 0 be a ﬁltration. F ≥ MARKOV PROCESSES 75

(1) If τ,σ are t-stopping times, then τ σ is an t-stopping time. (2) If τ are F-stopping times, then sup∧ τ is anF -stopping time. n Ft n n Ft (3) If τn are t+-stopping times such that τn τ and τn <τ n, then τ is an -stoppingF time. ↑ ∀ Ft

Proof. To prove (1), note that τ σ t = τ t σ t t t 0. To prove (2), note that sup τ{ ∧ t ≤= } τ{ ≤ t} ∪ { ≤t } ∈0. F To∀ prove≥ { n n ≤ } n{ n ≤ } ∈ Ft ∀ ≥ (3), ﬁnally, note that in this case τ t = τn

(5.9) τ := inf t 0 : X ∆ , ∆ { ≥ t ∈ } where τ(ω) := if t 0 : Xt(ω) ∆ = . Note that Xτ∆ ∆ if τ∆ < , X has right continuous∞ { ≥ sample paths,∈ } and∅ ∆ is closed. ∈ ∞

Proposition 5.15 (First entrance times). Let X have cadlag sample paths. If ∆ is closed, then τ is an X -stopping time. ∆ Ft

Proof. For each t 0, deﬁne a map S : [0, ) [0, ) by ≥ t DE ∞ → DE ∞ (St(w))s := ws t (s 0). ∧ ≥ Then St(X) is the process X stopped at time t. We claim that St(X) : Ω [0, ) is X -measurable. This follows from the facts that the → DE ∞ Ft Borel-σ-ﬁeld on E[0, ) is generated by the coordinate projections (πs)s 0 D ∞ 1 1 1 1 ≥ (Proposition 1.16) and that St(X)− (πs− (A)) = (St(w))s− (A)= Xs− t(A) X for each s 0 and A (E). Since E ∆ is an open subset of∧E it is∈ Ft ≥ ∈ B \ Polish, hence the space E ∆[0, ) is a Polish subspace of E[0, ), and D \ ∞ D ∞ therefore, by Proposition 1.24 (b), a countable intersection of open subsets of E[0, ). In particular, E ∆[0, ) is a measurable subset of E[0, ), D ∞ D \ ∞ X D ∞ and therefore τ t = St(X) E ∆[0, ) t for each t 0. { ≤ } { 6∈ D \ ∞ } ∈ F ≥

The next theorem shows what happens to a player who stops playing at a stopping time τ. For a proof, see for example [KS91, Theorem 1.3.22].

Theorem 5.16 (Optional sampling). Let ( t)t 0 be a ﬁltration, let M be F ≥ an t-submartingale with right continuous sample paths, and let τ be an -stoppingF time such that τ T for some T < . Then Ft ≤ ∞ (5.10) E[M ] M a.s. τ ≥ 0 76 JAN SWART AND ANITA WINTER

5.5. Applications. The next example gives an application of Theorem 5.11. Example 5.17 (Convergence of the Wright-Fisher diffusion). Let Xx be a x version of the Wright-Fisher diffusion started in X0 = x [0, 1]. Then there exists a random variable Xx such that E[Xx ]= x and ∈ ∞ ∞ x x lim Xt = X a.s. t ∞ →∞ Proof. It follows from Example 5.2 and (5.2) that X is a nonnegative martingale. Therefore, by Theorem 5.11, there exists a random variable X ∞ such that Xt X a.s. It follows from (5.1) and bounded convergence that E[X ]=→x. ∞ ∞ Exercise 5.17 leaves a number of questions open. It is not hard to see that the boundary points 0, 1 are traps for the Wright-Fisher diffusion, x { } in the sense that P [Xt = x t 0] = 1 if x 0, 1 . Therefore, we ask: is it true that the random variable∀ ≥ X from Example∈ { } 5.17 takes values in 0, 1 ? Does the Wright-Fisher diffusion∞ reach the traps in finite time? In {order} to answer these questions, we need one more piece of general theory. x x E Let (X, (P ) ∈ ) be a Feller process on a compact metrizable space E and with generator G. If h (G) satisfies Gh = 0 then it follows from ∈ D X Lemma 5.1 and formula (5.2) that (h(Xt))t 0 is an t -martingale. Even if a function f (G) does not satisfy Gf≥ = 0, weF can still associate a martingale with∈f. D Proposition 5.18 (Martingale problem). Let X be a version of a Feller process with generator ( (G), G), started in any initial law. Then, for every f (G), the process MDf given by ∈ D t (5.11) M f := f(X ) Gf(X )ds (t 0) t t − s ≥ Z0 is an X -martingale. Ft Proof. t f E[Mt u]= E[f(Xt) u] E[Gf(Xs) u]ds |F |F − 0 |F Zu t = Pt uf(Xu) Gf(Xs)ds Ps uGf(Xu)ds − − − − u Z0 Zu = f(X ) Gf(X )ds = M f , u − s u Z0 where we have used that t t P Gf = ∂ P f = P f f s ∂s s t − Z0 Z0 by Proposition 3.15.

The next two examples give applications of Proposition 5.18. MARKOV PROCESSES 77

Example 5.19 (Wright-Fisher diffusion converges to traps). Let Xx be a x version of the Wright-Fisher diffusion started in X0 = x [0, 1]. Then the random variable Xx from Example 5.17 is 0, 1 -valued. ∈ ∞ { } x x E Proof. Denote the Wright-Fisher diffusion by (X, (P ) ∈ ). The function 2 f(x) := x satisfies f (AWF) and AWFf(x) = x(1 x). Therefore, by Proposition 5.18, ∈ D − t Ex[X2]= Ex[X (1 X )] ds (t 0). t s − s ≥ Z0 Since X [0, 1] it follows, letting t , that t ∈ →∞ x ∞ E Xs(1 Xs)ds 1. 0 − ≤ h Z i In particular, 0∞ Xs(1 Xs)ds is finite a.s., which is possible only if X 0, 1 a.s. − ∞ ∈ { } R

Example 5.20 (Wright-Fisher diffusion gets trapped in finite time). Let Xx be a version of the Wright-Fisher diffusion started in Xx = x [0, 1]. 0 ∈ Define (using Proposition 5.15) an X -stopping time τ by Ft τ := inf t 0 : Xx 0, 1 . ≥ t ∈ { } Then E[τ] < . ∞ x x E Proof. Let (X, (P ) ∈ ) be the Wright-Fisher diffusion. The idea of the proof is to show that there exists a continuous function f : [0, 1] [0, ) such f(0) = f(1) = 0 and the process → ∞ t (5.12) M := f(X )+ 1 (X )ds (t 0) t t (0,1) s ≥ Z0 X is an t -martingale. Let us first explain why we are interested in such a function.F If the process in (5.12) is a martingale, then by optional sampling x x (Theorem 5.16), E [Mτ t]= E [M0], hence ∧ τ t x x ∧ x E [τ t]= E 1(0,1)(Xs)ds = f(x) E [f(Xτ t)] (t 0). ∧ 0 − ∧ ≥ h Z i Letting t we see that Ex[τ] f(x), so τ < a.s. Since f is zero on ↑ ∞ x ≤ ∞ 0, 1 it follows that E [f(Xτ t)] 0 as t , so we find that { } ∧ → ↑∞ (5.13) Ex[τ]= f(x) (x [0, 1]). ∈ 1 To get a function f such that (5.12) holds, we choose 0 < εn < 2 such that ε 0, we define n ↓ 2 (x (ε , 1 ε )), −x(1 x) ∈ n − n hn(x) :=  2−εn  (x [0, εn] [1 εn, 1]),  −ε (1 ε ) ∈ ∪ − n − n  78 JAN SWART AND ANITA WINTER and we put x y fn(x) := dy dz hn(z) (x [0, 1]). 0 1 ∈ Z Z 2 Then the functions f : [0, 1] R are continuous, symmetric in the sense n → that fn(x) = fn(1 x), and satisfy fn(0) = fn(1) = 0. Moreover, we have f f, where − n ↑ x y 2 f(x) := dy dz − (x [0, 1]). 0 1 z(1 z) ∈ Z Z 2 − To see that this is finite, note that for y 1 , ≤ 2 y 1 1 2 2 2 2 dz 1 dz − = dz 4 = 4 log( 2 ) log(y) , 1 z(1 z) y z(1 z) ≤ y z − Z 2 − Z − Z which is integrable at zero. The functions fn satisfy 1 AWFfn(x)= x(1 x)hn(x) n 1(0,1)(x) (x [0, 1]). 2 − ↓ →∞ − ∈ The fact that the process M in (5.12) is a martingale now follows from Proposition 5.18 and Lemma 5.21 below.

We say that a sequence of bounded real functions fn, deﬁned on a measurable space, converges to a bounded pointwise limit f, if fn f pointwise while sup f < . We denote this as → n k nk ∞ f = bp lim fn. n →∞ Recall that the integral is continuous with respect to bounded pointwise convergence. So, if Xn are real-valued random variables and Xn X, then E[X ] E[X]. → n → Lemma 5.21 (Bounded pointwise limits). Let X be a Feller process on a compact metrizable space E and let G be its generator. Let fn (G), f (E), and g B(E) be functions such that ∈ D ∈C ∈ f = bp lim fn and g = bp lim Gfn. n n →∞ →∞ Then the process M given by t M := f(X ) g(X )ds (t 0) t t − s ≥ Z0 is an X -martingale. Ft Proof. We know that the processes t M (n) := f (X ) Gf (X )ds (t 0) t n t − n s ≥ Z0 X (n) X (n) are t -martingales. In particular, E[Mt s ]= Ms a.s. for all 0 s t. ByF the deﬁnition of the conditional expectation,|F this is equivalent to≤ the≤ fact that MARKOV PROCESSES 79

(n) X (1) Ms is s -measurable, (n) F (n) X (2) E[M 1 ]= E[Ms 1 ] A , t A A ∀ ∈ Fs for all 0 s t. For each fixed t 0, we observe that ≤ ≤ ≥ (n) Mt = bp lim M . n t →∞ It follows that X (1) Ms is s -measurable, (2) E[M 1F]= E[M 1 ] A X , t A s A ∀ ∈ Fs which proves that E[M X ]= M a.s. for all 0 s t. t|Fs s ≤ ≤ 5.6. Non-explosion. Using martingales and stopping times, we can complete the proof of Proposition 4.10 started in Section 4.4. Let E be the one-point compactification of a locally compact separable metrizable space x x E E and let (X, (P ) ∈ ) be a Feller process in E with generator ( (G), G). x x E D Recall that (P ) ∈ is called non-explosive if x P Xt, Xt = t 0 = 1 x = . { − 6 ∞ ∀ ≥ } ∀ 6 ∞ Proof of Proposition 4.10 (continued). The fact that condition (2) from Pro- position 4.10 implies non-implosion has already been proved in Section 4.4. Assume that condition (1) from Proposition 4.10 holds. For each R > 0, put (5.14) O := x E : f(x) < R R { ∈ } and define stopping times τR by (5.15) τ := inf t 0 : X E O (R> 0). R { ≥ t ∈ \ R} Fix x E. By Proposition 5.18 and optional stopping, for each t> 0, ∈ x P τR t inf fn(x) { ≤ } x E O ∈ \ R t τR x x ∧ (5.16) E [fn(Xt τ )] = f(x)+ E Gfn(Xs)ds ≤ ∧ R Z0 f(x)+ t sup Gfn(x), h i ≤ x O ∈ R where in the last inequality we have used that f(Xs) < R for all s<τR. Since OR is compact and Gfn converges uniformly on compacta to g,

(5.17) lim sup sup Gfn(x) sup g(x). n x O ≤ x E →∞ ∈ R ∈ We claim that moreover

(5.18) inf fn(x) R. x E O n−→ ∈ \ R →∞ Indeed, by our assumptions, the sets x E O : f (x) R ε are { ∈ \ R n ≤ − } compact subsets of E, decreasing to the empty set. Therefore, for each ε> 0 there exists an n with x E O : f (x) R ε = . { ∈ \ R n ≤ − } ∅ 80 JAN SWART AND ANITA WINTER

Inserting (5.17) and (5.18) into (5.16), we ﬁnd that x 1 (5.19) P τR t R− f(x)+ t sup g(x) . { ≤ }≤ x E ∈ Letting R shows that ↑∞ x (5.20) P Xs, Xs = s t = 1 { − 6 ∞ ∀ ≤ } for each ﬁxed t> 0. Letting t shows that (Px)x E is non-explosive. ↑∞ ∈ MARKOV PROCESSES 81

6. Convergence of Markov processes 6.1. Convergence in path space. In this section, we discuss the convergence of a sequence of Feller processes to a limiting Feller process. The martingale problem from Proposition 5.18 will play an important role in the proofs. As an application of our main result, we will complete the proof of Theorem 4.2. Let be multivalued linear operators on a Banach space V , i.e., the Gn Gn are linear subspaces of V V . We deﬁne the extended limit exlimn n as × →∞ G (6.1) ex lim n := (f, g) : (fn, gn) n s.t. (fn, gn) (f, g) . n n →∞ G { ∃ ∈ G −→→∞ } If the are single-valued, and therefore the graphs of some linear operators Gn ( (An), An), and moreover exlimn n is single-valued and the graph of D →∞ G ( (A), A), then we also write exlimn An = A. D →∞ Exercise 6.1. Show that exlimn n is always a closed linear operator. Show that →∞ G (i) ex lim (λ1 + λ2 n)= λ1 + λ2ex lim n for all λ1, λ2 R, λ2 = 0. n G n G ∈ 6 →∞ 1 1 →∞ (ii) ex lim − = (ex lim n)− . n n n →∞ G →∞ G (iii) If n is dissipative for each n then exlimn n is dissipative. G →∞ G Exercise 6.2. Let An, A be bounded linear operators. Show that Af = limn Anf for all f V implies A = exlimn An. Hint: Lemma 3.43. →∞ ∈ →∞ The main result of this section is: Theorem 6.3. (Convergence of Feller processes) Let E be a compact (n),x x E x x E metrizable space and let (P ) ∈ and (P ) ∈ be Feller processes in E (n) with Feller semigroups (Pt )t 0 and (Pt)t 0 and generators Gn and G, respectively. Then the following≥ statements are≥ equivalent:

(a) ex lim Gn G. n →∞ ⊃ (b) ex lim Gn = G. n (n→∞) (c) P f Ptf for all f (E) and t 0. t n−→ ∈C ≥ (n),µn →∞ (n),µ (d) P (Xt ,...,Xt ) = P (Xt ,...,Xt ) 1 m n 1 m { ∈ · } →∞⇒ { ∈ · } whenever µn = µ. n ⇒ (n),µn µ→∞ (e) P = P whenever µn = µ. n n →∞⇒ →∞⇒ Condition (a) means that exlimn Gn, considered as a multivalued opera- →∞ tor, contains G. Thus, (a) says that for all f (G) there exist fn (Gn) such that f f and G f Gf. We can∈ reformulate D conditions∈ (d) D and n → n n → (e) as follows. Let X(n) and X be random variables with laws P(n),µn and Pµ, respectively, i.e., X(n) is a version of the Markov process with semigroup (n) (n) (Pt )t 0, started in the initial law (X0 )= µn and X is a version of the ≥ L Markov process with semigroup (Pt)t 0, started in the initial law (X0)= µ. ≥ L 82 JAN SWART AND ANITA WINTER

(n) Then condition (d) says that µn µ implies that X converges to X in finite dimensional distributions,⇒ and (e) says that µ µ implies that n ⇒ (X(n)) (X), where (X(n)) and (X) are probability measures on the L ⇒L L L (n) ‘path space’ E[0, ). In this case we say that X converges to X in the sense of weakD convergence∞ in path space. Under weak additional assumptions, weak convergence in path space implies convergence in finite dimensional distributions. Lemma 6.4. (Converge of finite dimensional distributions) Let Y (n) and Y be E[0, )-valued random variables. Assume that P Yt = Yt = 1 D ∞ { − } for all t 0. Then (Y (n)) (Y ) implies that (Y (n),...,Y (n)) ≥ L ⇒ L L t1 tk ⇒ (Y ,...,Y ) for all 0 t t . L t1 tk ≤ 1 ≤···≤ k Proof. See [EK86, Theorem 3.7.8 (a)].

Exercise 6.5. Assume that Y (n) and Y are stochastic processes with sample (n) paths in E[0, ). Show that weak convergence in path space (of the Y to Y ) impliesC convergence∞ in finite dimensional distributions. Weak convergence in paths space is usually a more powerful statement that convergence in finite dimensional distributions (and more difficult to prove). The next example shows that weak convergence in path space is not implied by convergence of finite dimensional distributions. (Counter-)Example. Let for n 1, Xn be the 0, 1 -valued Markov process with infinitesimal matrix (generator)≥ { } 1 1 (6.2) A(n) := −n n − n and initial law P X0 = 0 = 1. Recall that then{ the corresponding} semigroup is given by (n) A(n)t Tt f = e f 1 ( (n + 1)t)k = Id − A(n) f (6.3) − (n + 1) k! k 1 X≥ 1 (n+1)t (n) = Id (e− 1)A f − (n + 1) − (n) k k 1 (n) where we have used that (A ) = ( (n + 1)) − A . Put f(0) := 1 and f(1) := 0 then −

0,(n) (n) 1 (n+1)t (6.4) P Xt = 0 = Tt f(0) = 1 (1 e− ) 1. { } − (n + 1) − n−→→∞ One can iterate the argument to show that ﬁnite dimensional distributions of X under P0,(n) converge to those of a process that is identical 1. One the other hand, P0,(n) is supported on the set of paths which E0,(n)[τ] = 1, MARKOV PROCESSES 83 where τ := inf t 0 : X 1 < . Hence, the sequence (Xn) does { ≥ t ∈ { }} ∞ L not converge in the sense of weak convergence on 0,1 [0, ). D{ } ∞

Exercise 6.6. Let Y be a Poisson process with parameter λ, and deﬁne

n 1 2 (6.5) X := Y 2 λn t . t n n t − apply Theorem 6.3 to show that Xn converges in distribution and identify its limit. { }

The fact that conditions (a), (b) and (c) from Theorem 6.3 are equivalent follows from abstract semigroup theory. We will only prove the easy implication (c) (b). For a full proof, see [EK86, Theorem 1.6.1]. ⇒ (n) Proposition 6.7. (Convergence of semigroups) Assume that (St )t 0 ≥ and (St)t 0 are strongly continuous contraction semigroups on a Banach ≥ space V , with generators Gn and G, respectively. Then the following statements are equivalent:

(a) ex lim Gn G. n →∞ ⊃ (b) ex lim Gn = G. n (n→∞) (c) S f Stf for all f V and t 0. t n −→→∞ ∈ ≥

1 Proof. (c) (b): Fix λ> 0. By Lemma 3.21, (λ G)− is a bounded linear operator which⇒ is given by −

1 ∞ λt (λ G)− f = S fe− dt (f V ). − t ∈ Z0 A similar formula holds for (λ G ) 1. Since S and S(n) are contractions, − n − t t S f S(n)f 2 f , so using bounded convergence k t − t k≤ k k 1 1 ∞ (n) λt (λ G)− f (λ Gn)− f Stf S f e− dt 0. k − − − k≤ k − t k n−→ Z0 →∞ By Exercise 6.2 this proves that 1 1 ex lim (λ Gn)− = (λ G)− . n →∞ − − By Exercise 6.1, it follows that exlimn Gn = G. Since (b) (a) is trivial, to complete→∞ the proof it suﬃces to prove that (a) (c). This⇒ implication is more diﬃcult. One proves that the Yosida ⇒ approximations Gε and Gn,ε of G and Gn satisfy Gn,εf Gεf for each f V and ε > 0, uses this to derive estimates that are uniform→ in ε, and then∈ lets ε 0. → 84 JAN SWART AND ANITA WINTER

The main technical tool in the proof of Theorem 6.3 is a tightness criterion for sequences of probability laws on E[0, ), which we will not prove. Re- call the concept of tightness from PropositionD ∞ 3.2. To stress the importance of tightness, we note the following fact. Lemma 6.8. (Application of tightness) Let Y (n) be a sequence of processes with sample paths in [0, ). Assume that the finite dimensional DE ∞ distributions of Y (n) converge and that the laws (Y (n)) are tight. Then there L exists a process Y with sample paths in [0, ) such that (Y (n)) (Y ). DE ∞ L ⇒L (n) (n) Proof. The weak limits limn (Yt1 ,...,Ytn ) form a consistent family in the sense of Kolmogorov’s extension→∞ L theorem, so by the latter there exists an (n) E-valued process Y ′ such that the Y converge to Y ′ in finite dimensional distributions. Since the laws (Y (n)) are tight, we can select a convergent L subsequence (Y (nm)) (Y ). If we can show that all convergent sub- sequences haveL the same⇒ limit L (Y ), then by the exercise below, the laws L (Y (n)) converge to (Y ). L L u For any function f (E) and 0 t < u, the map w t f(w(s))ds from [0, ) to R is∈ bounded C and≤ continuous. (Note that7→ the coordi- DE ∞ R nate projections are not continuous!) Therefore, (Y (nm)) (Y ) implies u (nm) u L ⇒L that E[ t f(Ys )ds] E[ t f(Ys)ds] for each t 0 and ε > 0. More- u (nm) → u (nm) u ≥ over E[Rt f(Ys )ds] = t RE[f(Ys )]ds t E[f(Ys′)]ds by bounded convergence, so by the right-continuity of sample→ paths R R R ε ε 1 1 E[f(Yt)] = lim E ε− f(Yt+s)ds = lim ε− E[f(Yt′+s)]ds. ε 0 ε 0 → Z0 → Z0 A similar argument shows that ε 1 (6.6) E[f1(Yt1 ) fk(Ytk )] = lim ε− E[f1(Yt′ +s) fk(Yt′ +s)]ds. · · · ε 0 1 · · · k → Z0 for any f1,...,fk (E) and 0 t1 tk. This clearly determines the finite dimensional∈C distributions≤ of Y ,≤···≤ and therefore (Y ), uniquely. (Warn- L ing: the finite dimensional distributions of Y and Y ′ need in general not be the same!)

Exercise 6.9. Let M be a metrizable space and let (xn)n 1 be a sequence in M. Assume that the closure of the set x : n 1 is compact≥ and that { n ≥ } the sequence (xn)n 1 has only one cluster point x. Show that xn x. ≥ → The next theorem relates tightness of probability measures on E[0, ) to martingales in the spirit of Proposition 5.18. Below, for anyD measurable∞ function h : [0, ) R, T > 0, and p [1, ] we deﬁne: ∞ → ∈ ∞ T 1 h(t) pdt p if p< , (6.7) h p,T := 0 | | ∞ k k ( ess supt [0,T ] h(t) if p = . R ∈ | | ∞ MARKOV PROCESSES 85

Here the essential supremum is defined as: ess sup h(t) := inf H 0 : h(t) H a.s. , t [0,T ] | | { ≥ | |≤ } ∈ where a.s. means almost surely with respect to Lebesgue measure. Thus, h p,T is just the Lp-norm of the function [0, T ] t h(t) with respect to Lebesguek k measure. ∋ 7→ Theorem 6.10. (Tightness criterion) Let E be compact and metrizable and let X(n) : n 1 be a sequence of processes with sample paths in { ≥ } (n) (n) (n) E[0, ), defined on probability spaces (Ω , P , ) and adapted to fil- D ∞ (n) F trations ( t )t 0. Let (E) be dense and assume that for all f ≥ F (D⊂Cn) (n) (n) ∈ D and n 1 there exist t -adapted real processes F and G with cadlag sample≥ paths, such thatF t M := F (n) G(n)ds t t − s Z0 is an (n)-martingale, and such that for each T > 0, Ft (n) (n) (n) (6.8) sup E sup Ft f(Xt ) < n t [0,T ] Q | − | ∞ ∈ ∩ and (n) (n) (6.9) sup E G p,T < for some p (1, ]. n k k ∞ ∈ ∞ Then the laws (X(n)) : n 1 are tight. {L ≥ } Proof. This is a much simplified version of Theorems 3.9.1 and 3.9.4 in [EK86].

(n) Remark. For example, if X is a Feller process with generator Gn and (n) t (n) fn (Gn), then by Proposition 5.18, Mt := fn(Xt ) 0 Gnfn(Xs )ds ∈ D X(n) − is an t -martingale. Thus, a typical application of TheoremR 6.10 is to F(n) (n) (n) (n) take Ft := fn(Xt ) and Gt := Gnfn(Xt ). Counterexample. Taking p = 1 in (6.9) is not suﬃcient. To see this, for n 1 let X(n) be the Markov process with generator (6.2) and initial law (n≥) (n) P X0 = 0 = 1. Take for the space of all real functions f on 0, 1 and { } (n) D (n) (n) (n) { } for such a function put Ft := f(Xt ) and Gt := Anf(Xt ). Then by (n) t (n) X(n) Proposition 5.18, Ft 0 Gs ds is an t -martingale, (6.8) is satisﬁes, and by (6.4) − F R E g (X(n)) = E A(n)f(X(n)) | n t | | t | = n f(0) f(1) P X(n) = 1 + f(0) f(1) P X(n) = 0 | − | { t } | − | { t } n 1 (n+1)t = f(0) f(1) 1+ − (1 e− ) 2 f(0) f(1) . | − | n + 1 − ≤ | − | 86 JAN SWART AND ANITA WINTER

This shows that T sup E g (X(n)) = sup E g (X(n)) ds 2T f(0) f(1) < , k n k1,T | n t | ≤ | − | ∞ n n Z0 so (6.9) is satisﬁed for p = 1. Since the X(n) converge in ﬁnite dimensional distributions, if the laws (X(n)) : n 1 were tight, then X(n) would also converge weakly in path{L space. We have≥ } already seen that this is not the case.

Proof of Theorem 6.3. Conditions (a), (b) and (c) are equivalent by Propo- sition 6.7. Our next step is to show that (c) is equivalent to (d). Indeed, if (c) holds, then for any f ,...,f (E) and 0= t t , 1 n ∈C 0 ≤···≤ k E(n),µn (n) (n) f1(Xt1 ) fk(Xtk ) = µnPt1 t0 f1 Pt t fk · · · − · · · k− k−1 Eµ µPt1 t0 f1 Ptk tk−1 fk = f1(Xt1 ) fk(Xtn ) , n − − −→→∞ · · · · · · where we have used Lemma 3.43. This implies (d). Conversely, if (d) holds, then for any f (E), x x, and t 0, ∈C n → ≥ (n) (n),xn x P f(xn)= E [f(Xt)] E [f(Xt)] = Ptf(x), t n −→→∞ (n) which proves that Pt f converges uniformly to Ptf (compare the proof of Proposition 3.7). To complete the proof, it suffices to show that (a) and (d) imply (e) and that (e) implies (b). (Warning: it is not immediately obvious that (e) implies (d) since weak convergence in path space does not in general imply convergence in finite dimensional distributions.) (a) & (d) (e): Let X(n) be random variables with laws P(n),µn . We ⇒ start by showing that the laws (X(n)) are tight. This is a straightforward application of Theorem 6.10.L We choose := (G), which is dense in D D (E). By (a), for each f there exist fn (Gn) such that fn f C ∈(n D) (n) ∈ D(n) (n) → and Gnfn Gf. Setting Ft := fn(Xt ) and Gt := Gnfn(Xt ), using Proposition→ 5.18, we see that (6.8) and (6.8) are satisfied, where in the latter we can take p = . Since the laws∞(X(n)) are tight, we can select a convergent subsequence L (X(nm)) (X). We are done if we can show that (X)= Pµ (and hence Lall weak cluster⇒L points are the same). In the same wayL as in the proof of Lemma 6.8 (see in particular (6.6)), we find that ε 1 E[f1(Xt1 ) fk(Xt )] = lim ε− ds µPt1 t0+sf1 Pt t +sfk · · · k ε 0 − · · · k− k−1 → Z0 µPt1 t0 f1 Pt t fk − · · · k− k−1 for any f ,...,f (E) and 0 t t . This proves that X is a 1 k ∈ C ≤ 1 ≤··· ≤ k version of the Markov process with semigroup (Pt)t 0 started in the initial law µ. ≥ MARKOV PROCESSES 87

(e) (b): This is similar to the proof of the implication (c) (b) in Propo- sition⇒ 6.7. Fix λ> 0. Then ⇒

1 x ∞ λt (λ G)− f(x)= E f(X )e− dt (x E, f (E)). − t ∈ ∈C Z0 A similar formula holds for (λ G ) 1. Since w ∞ f(w(t))e λtdt from − n − 7→ 0 − [0, ) to R is bounded and continuous, P(n),xn Px implies that DE ∞ ⇒R 1 1 (λ Gn)− f(xn) (λ G)− f(x) (f (E), xn,x E, xn x). n − −→→∞ − ∈C ∈ → This shows that (λ G ) 1f (λ G) 1f 0. Just as in the proof of k − n − − − − k→ Proposition 6.7, this implies that exlimn Gn = G. →∞

6.2. Proof of the main result (Theorem 4.2). The proof of Theorem 6.3 has an important corollary. Corollary 6.11. (Existence of limiting process) Let E be compact and (n) metrizable and let (Pt )t 0 and (Pt)t 0 be Feller semigroups on (E) with ≥ ≥ C generators Gn and G, respectively. Assume that exlimn Gn G and (n),x x E→∞ ⊃ that for each n there exists a Markov process (P ) ∈ with semigroup (n) x x E (Pt )t 0. Then there exists a Markov process (P ) ∈ with semigroup ≥ (Pt)t 0. ≥ Proof. By Proposition 2.12, there exists for each x E an E-valued stochas- x x x ∈x tic process X = (Xt )t 0 such that X0 = x and X satisﬁes the equivalent conditions (a)–(c) from≥ Proposition 2.11. We need to show that Xx has a version with cadlag sample paths. Let X(n),x be [0, )-valued random DE ∞ variables with laws P (n),x. Our proof of Theorem 6.3 shows that the laws (X(n),x) are tight and that each cluster point has the same ﬁnite dimen- L sional distributions as Xx. It follows that the X(n),x converge weakly in path space and that their limit is a version of Xx with cadlag sample paths.

We will use Corollary 6.11 to complete the proof of Theorem 4.2. All we need to do is to show that a general Feller semigroup can be approximated by ‘easy’ semigroups, for which we know that they correspond to a Markov process.

Proof of Theorem 4.2. Let E be compact and metrizable and let (Pt)t 0 be ≥ a Feller semigroup on E with generator G. For each ε > 0, let Gε denote the Yosida approximation to G, deﬁned in (3.91). We claim that Gε is the generator of a jump process in the sense of Proposition 4.4 (and hence there exists a Markov process associated with the semigroup generated by Gε). Indeed, by Lemma 3.21,

1 ∞ 1 t/ε (1 εG)− f = P f ε− e− dt, − t Z0 88 JAN SWART AND ANITA WINTER so if we deﬁne continuous probability kernels Kε on E by

∞ 1 t/ε K (x, A) := P (x, A) ε− e− dt (x E, A (E)), ε t ∈ ∈B Z0 then G f = ε 1(K f f), which shows that G is the generator of a jump ε − ε − ε process. Choose εn 0. Then formula (3.92) implies that exlimn Gεn → →∞ x x ⊃E G, which by Corollary 6.11 shows that there exists a Markov process (P ) ∈ with semigroup (Pt)t 0. ≥ MARKOV PROCESSES 89

7. Strong Markov property

Let X := (Xt)t 0, defined on (Ω, , P), be an E-valued Markov process ≥ F with respect to a filtration ( t)t 0 such that X is ( t)-progressive (recall Definition 5.3). F ≥ F Recall that the Markov property says that given the “present”, the future is independent of the past. In this section we want to replace the deterministic notion of “present” by a stopping time. Recall the intuitive description of as the information known to an Ft observer at time t. For an ( t)-stopping time τ, the σ-algebra should have the same intuitive meaning.F

Deﬁnition 7.1 (σ-algebra generated by a stopping time). For an t-stopping time τ, put F (7.1) := A : A τ t , t 0 . FS ∈ F ∩ { ≤ } ∈ Ft ∀ ≥ Similarly, τ+ is deﬁned by replacing in (7.1) t by t +. F F F

Exercise 7.2. Fix t 0. Show that if P τ = t , then t = τ upto P-zero sets. ≥ { } F F

We immediately get the following useful properties.

Lemma 7.3. Let σ and τ be ( t)-stopping times, let X be an ( t)-progressive E-valued process. Then the followingF hold: F

(i) τ is a σ-algebra. (ii) τF σ is -measurable. ∧ Fτ (iii) If σ τ then σ τ . (vi) X is⊆ -measurable.F ⊆ F τ Fτ

Proof. (i) Obviously, , Ω . If A then ∅ ∈ Fτ ∈ Fτ (7.2) Ac τ t = τ t (A τ t ) ∩ { ≤ } { ≤ } \ ∩ { ≤ } ∈ Ft c for all t 0, and therefore A τ . Similarly, if (An)n N τ then ≥ ∈ F ∈ ∈ F (7.3) ( A ) τ t = (A τ t ) n ∩ { ≤ } n ∩ { ≤ } ∈ Ft n N n N [∈ [∈ for all t 0, and hence n N An τ . ≥ ∈ ∈ F (ii) For each c 0 and t 0, ≥ S ≥ (7.4) σ τ c τ t = σ τ c t τ t . { ∧ ≤ } ∩ { ≤ } { ∧ ≤ ∧ } ∩ { ≤ } ∈ Ft Hence σ τ c and σ τ is -measurable. { ∧ ≤ } ∈ Fτ ∧ Fτ (iii) Let A . Then for all t 0, ∈ Fσ ≥ (7.5) A τ t = A τ t σ t . ∩ { ≤ } ∩ { ≤ } ∩ { ≤ } ∈ Ft Hence A . ∈ Fτ 90 JAN SWART AND ANITA WINTER

(iv) Fix t 0, and apply (ii) to τ := t to the eﬀect that σ t . ≥ ∧ ∈ Fσ Xσ t is the composition of the (Ω, t)-([0,t] Ω, ([0,t]) t) measurable mapping∧ which sends ω to (σ(ω) Ft,ω) with× theB ([0,t] ×Ω, F ([0,t]) )- ∧ × B × Ft (E, (E))-measurable mapping which sends (s,ω) to Xs(ω). Notice that for theB measurability of the second mapping one uses that X is ( )-progressive. Ft As a consequence Xσ t is t-measurable. Therefore, for all t 0 and Γ B(E), ∧ F ≥ ∈ (7.6) Xσ Γ σ t = Xσ t Γ σ t t. { ∈ } ∩ { ≤ } { ∧ ∈ } ∩ { ≤ } ∈ F Hence Xσ Γ σ for all Γ B(E), or equivalently, Xσ is σ- measurable.{ ∈ } ∈ F ∈ F

We next deﬁne the strong Markov property of a Markov process.

Definition 7.4 (Strong Markov property). Let X := (Xt)t 0, defined on ≥ (Ω, , P), be an E-valued Markov process with respect to a filtration ( t)t 0 suchF that X is ( )-progressive (recall Definition 5.3). Suppose P (x,F A)≥is Ft t a transition function for X, and let τ be a ( t)-stopping time with τ < , almost surely. F ∞ X is said to be strong Markov at τ if • (7.7) P X A = P (X , A) τ+t ∈ |Gτ t τ for all t 0 and A (E). X is said≥ to be a strong∈B Markov process with respect to ( ) if X is • Ft strong Markov at τ for all ( t)-stopping times τ with τ < , almost surely. F ∞

(Counter-)Example. A typical counterexample appears once we mix deterministic evolution with random evolution. Consider the R-valued process which has the following dynamics. If x = 0, then X grows (deterministically) with unit speed, • while6 if X reaches x = 0, then it spends their an exponential time • with unit parameter. In formulae, its semigroup is given as (7.8) f(x + t), if x 0,x + t 0 ≤ ≤ (t+x) t u Ttf(x) := e f(0) + du e f(t u), if x 0,x + t> 0,  − x − − ≤  −  f(Rx + t), if x> 0. It is easy to check that (7.8) indeed gives a Markovian semigroup. To  see that the corresponding Markov process does not has the strong Markov property, put (7.9) σ := inf t 0 : X > 0 , { ≥ t } and start the process in x< 0 (and thereby ensure that σ < , a.s.). Since X ∞ X σ t = s [0,t] Q Xs 0 t for all t 0, σ is a ( t+)-stopping { ≥ } ∈ ∩ { ≤ } ∈ F ≥ F S MARKOV PROCESSES 91 time. Moreover, since X has right continuous paths, Xσ+ = Xσ = 0 . Hence E[X X ]

Proof. Let τ be a discrete ( t)-stopping time with τ < a.s. We need to show that for all B , F ∞ ∈ Gτ

(7.10) E f(Xt+τ ); B = E Pt(Xτ , dy)f(y); B . Z By assumption, there are t ,t ,... such that τ t ,t ,... . Further- 1 2 ∈ { 1 2 } more, if B τ then B τ = tk tk for all k N, and hence for all f B(E) and∈ Gt 0, ∩ { } ∈ G ∈ ∈ ≥ E f(X ); B τ = t = E f(X ); B τ = t t+τ ∩ { k} tk+τ ∩ { k} = E P (X , dy)f(y); B τ = t (7.11) t ti ∩ { k} Z = E P (X , dy)f(y); B τ = t . t τ ∩ { k} Z Summing over all k yields (7.10).

The next result states that each stopping time is the limit of a decreasing sequence of discrete stopping times.

Lemma 7.6. Let ( t)t 0 be a ﬁltration, and τ be a ( t+)-stopping time. F ≥ F Then there exists a decreasing sequence (τn)n N of discrete ( t)-stopping ∈ F times such that τ = limn τn. →∞

n n Proof. Choose for each n N,0= t0

Theorem 7.7 (Feller semigroups give strong Markov processes). Let E be locally compact and separable, and let (Pt)t 0 be a Feller semigroup on ≥ Cb(E). Then for each probability law ν on E there exists a Markov process X corresponding to (Pt)t 0 with initial law ν and sample paths in E([0, )) ≥ D ∞ which is strong Markov with respect to the ﬁltration := X . Ft Ft+ Proof. We already know from Theorem 4.2 (combined with the consider- ations for locally compact state spaces discussed in Subsection 4.4) that under the above assumptions there is a Markov process X with (cadlag paths) corresponding to (Pt)t 0 with initial law ν. It remains to verify the strong Markov property.. ≥ Assume for the moment that τ is a discrete ( t)-stopping time with τ < , i.e., τ can be written as F ∞ (7.13) τ := t 1 τ = t n { n} n 1 X≥ for suitable (tn)n N in [0, ). Let A t, s > 0, and f Cˆ(E). Then τ = t ∈for all ε>∞0 and n ∈N, F so ∈ { n} ∈ Ftn+ε ∈

dP f(Xτ+s)= dP f(Xtn+s) A τ=tn A τ=tn (7.14) Z ∩{ } Z ∩{ }

= Ps ε f(Xtn+ε) − A τ=tn Z ∩{ } for all ε [0,s]. Since (Pt)t 0 is strongly continuous, Tsf is continuous on E for all∈s 0. Moreover,≥ since X has right continuous sample paths, we can let ε 0≥ in (7.14) to the eﬀect that it holds with for ε = 0 as well. This gives ↓ (7.15) E X = P f(X ) τ+s Fτ s τ for discrete τ.

If τ is an arbitrary ( t)-stopping time, with τ < , a.s., we know from Lemma 7.6 that τ canF be written as the decreasing∞ limit of discrete stopping times (τn)n N. It follows then from continuity of Psf and the right continuous sample∈ paths that (7.15) holds. References [Bil86] P. Billingsley. Probability and Measure. John Wiley & Sons, New York, 1986. [Bou58] N. Bourbaki. Eléments´ de Mathématique, 2nd ed., Book 3, Chap. 9. Hermann & Cie, Paris, 1958. [Bou64] N. Bourbaki. Eléments´ de Mathématique, 2nd ed., Book 3, Fascicule de Résultats. Hermann & Cie, Paris, 1964. [Cho69] G. Choquet. Lectures on Analysis, Vol. 1. W.A. Benjamin, New York, 1969. [Chu74] K.L. Chung. A Course in Probability Theory, 2nd ed. Academic Press, Orlando, 1974. [Dan19] P.J. Daniell. Integrals in an infinite number of dimensions. Annals of Mathemat- ics, 20:281–288, 1919. [EK86] Stewart N. Ethier and Thomas G. Kurtz. Markov processes: Characterization and convergence. John Wiley and Sons, 1986. [Fri64] A. Friedman. Partial Differential Equations of Parabolic Type. Prentice-Hall, En- glewood Cliffs, 1964. [Kel55] J.L. Kelley. General Topology. Van Nostrand, New York, 1955. [Kol33] A.N. Kolmogorov. Grundbegriffe der Wahrscheinlichkeitstheorie, volume 2(3) of Ergeb. Math. Springer, Berlin, 1933. [Kol56] A.N. Kolmogorov. On skorohod convergence. Theory Probab. Appl., 1:213–222, 1956. [KS88] Ioannis Karatzas and Steven E. Shreve. Brownian Motion and Stochastic Calcu- lus. Springer-Verlag, 1988. [KS91] I. Karatzas and E.S. Shreve. Brownian Motion and Stochastic Calculus, 2nd ed. Springer, New York, 1991. [RS80] Michael Reed and Barry Simon. Functional Analysis, volume I. Academic Press, Inc., 1980. [Sch73] L. Schwartz. Radon Measures on Arbitrary Topological Spaces and Cylindical Measures. Tata Institute, Oxford University Press, London, 1973.

Jan Swart, Mathematisches Institut, Universitat¨ Erlangen–Nurnberg,¨ Bis- 1 marckstraße 1 2 , 91054 Erlangen, GERMANY E-mail address: [email protected]

Anita Winter, Mathematisches Institut, Universitat¨ Erlangen–Nurnberg,¨ 1 Bismarckstraße 1 2 , 91054 Erlangen, GERMANY E-mail address: [email protected]