Advanced Chapter 6 Martingale et al.
SECTION 1 gives some examples of martingales, submartingales, and supermartingales. SECTION 2 introduces stopping times and the sigma-fields corresponding to “information available at a random time.” A most important Stopping Time Lemma is proved, extending the martingale properties to processes evaluted at stopping times. SECTION 3 shows that positive supermartingales converge almost surely. SECTION 4 presents a condition under which a submartingale can be written as a difference between a positive martingale and a positive supermartingale (the Krickeberg decomposition). A limit theorem for submartingales then follows. SECTION *5 proves the Krickeberg decomposition. SECTION *6 defines uniform integrability and shows how uniformly integrable martingales are particularly well behaved. SECTION *7 show that martingale theory works just as well when time is reversed. SECTION *8 uses reverse martingale theory to study exchangeable probability measures on infinite product spaces. TheProb de Finetti representation and the Hewitt-Savage zero-one law are proved.
1. What are they?
The theory of martingales (and submartingales and supermartingales and other related concepts) has had a profound effect on modern probability theory. Whole branches of probability, such as stochastic calculus, rest on martingale foundations. The theory is elegant and powerful: amazing consequences2018 flow from an innocuous assumption regarding conditional expectations. Every serious user of probability needs to know at least the rudiments of martingale theory. A little notation goes a long way in martingale theory. A fixed probability space (, F, P) sits in the background. The key new ingredients are: (i) a subset T of the extended real line R;
(ii) a filtration {Ft : t ∈ T }, that is, a collection of sub-sigma-fields of F for which Fs ⊆ Ft if s < t;
(iii) a family of integrable random variables {Xt : t ∈ T } adapted to the filtration, that is, Xt is Ft -measurable for each t in T . 6.1 What are they? 139
The set T has the interpretation of time, the sigma-field Ft has the interpretation of information available at time t, and Xt denotes some random quantity whose value Xt (ω) is revealed at time t.
<1> Definition. A family of integrable random variables {Xt : t ∈ T } adapted to a {F ∈ } Advancedfiltration t : t T is said to be a martingale (for that filtration) if (MG) Xs = P(Xt | Fs ) for all s < t. a.s. Equivalently, the random variables should satisfy (MG) PXs F = PXt F for all F ∈ Fs ,alls < t. Remark. Often the filtration is fixed throughout an argument, or the particular choice of filtration is not important for some assertion about the random variables. In such cases it is easier to talk about a martingale {Xt : t ∈ T } without explicit mention of that filtration. If in doubt, we could always work with the filtration!natural, Ft := σ{Xs : s ≤ t}, which takes care of adaptedness, by definition. Analogously, if there is a need to identify the filtration explicitly, it is convenient to speak of a martingale {(Xt , Ft ) : t ∈ T }, and so on.
Property (MG) has the interpretation that Xs is the best predictor for Xt based on the information available at time s. The equivalent formulation (MG) is a minor repackaging of the definition of the conditional expectation P(Xt | Fs ). The Fs -measurability of Xs comes as part of the adaptation assumption. Approximation by simple functions, and a passage to the limit, gives another equivalence, (MG) PXs Z = PXt Z for all Z ∈ Mbdd(Fs ),alls < t,
where Mbdd(Fs ) denotes the set of all bounded, Fs -measurable random variables. Prob The formulations (MG) and (MG) have the advantage of removing the slippery concept of conditioning on sigma-fields from the definition of a martingale. One could develop much of the basic theory without explicit mention of conditioning, which would have some pedagogic advantages, even though it would obscure one of the important ideas behind the martingale concept. Several of the desirable properties of martingales are shared by families of random variables for which the defining equalities (MG) and (MG) are relaxed to inequalities. I find that one of the hardest things to remember about these martingale relatives is which name goes with which direction of the2018 inequality. <2> Definition. A family of integrable random variables {Xt : t ∈ T } adapted to a filtration {Ft : t ∈ T } is said to be a submartingale (for that filtration) if it satisfies any (and hence all) of the following equivalent conditions:
(subMG) Xs ≤ P(Xt | Fs ) for all s < t, almost surely (subMG) PXs F ≤ PXt F for all F ∈ Fs ,alls < t. P ≤ P ∈ M+ (F ) < , (subMG) Xs Z Xt Z for all Z bdd s ,alls t
The family is said to be a supermartingale (for that filtration) if {−Xt : t ∈ T } is a submartingale. That is, the analogous requirements (superMG), (superMG) ,and (superMG) reverse the direction of the inequalities. 140 Chapter 6: Martingale et al.
Remark. It is largely a matter of taste, or convenience of notation for particular applications, whether one works primarily with submartingales or supermartingales.
For most of this Chapter, the indexset T will be discrete, either finite or equal to N, the set of positive integers, or equal to one of N ={ }∪N N = N ∪ {∞} N ={ }∪N ∪ {∞}. Advanced0 : 0 or : or 0 : 0 For some purposes it will be useful to have a distinctively labelled first or last element in the indexset. For example,if a limit X∞ := limn∈N Xn can be shown to exist, it is natural to ask whether {Xn : n ∈ N} also has sub- or supermartingale properties. Of course such a question only makes sense if a corresponding sigma-field F∞ exists. If it is not otherwise defined, I will take F∞ to be the sigma-field σ (∪i<∞Fi ). Continuous time theory, where T is a subinterval of R, tends to be more complicated than discrete time. The difficulties arise, in part, from problems related to management of uncountable families of negligible sets associated with uncountable collections of almost sure equality or inequality assertions. A nontrivial part of the continuous time theory deals with sample path properties, that is, with the behavior of a process Xt (ω) as a function of t for fixed ω or with properties of X as a function of two variables. Such properties are typically derived from probabilistic assertions about finite or countable subfamilies of the {Xt } random variables. An understanding of the discrete-time theory is an essential prerequisite for more ambitious undertakings in continuous time—see AppendixE. For discrete time, the (MG) property becomes
PXn F = PXm F for all F ∈ Fn,alln < m. Prob= + ∈ N It suffices to check the equality for m n 1, with n 0, for then repeated appeals to the special case extend the equality to m = n + 2, then m = n + 3, and so on. A similar simplification applies to submartingales and supermartingales. <3> Example. Martingales generalize the theory for sums of independent random variables. Let ξ1,ξ2,... be independent, integrable random variables with Pξn = 0 for n ≥ 1. Define X0 := 0 and Xn := ξ1 + ...+ ξn. The sequence {Xn : n ∈ N0} is a martingale with respect to the natural filtration, because for F ∈ Fn−1,
P(Xn − Xn−1)F = (Pξn)(PF) = 0 by independence. ,..., ξ ,...,ξ You could write F as a measurable function of X1 X2018n−1,orof 1 n−1,if you prefer to work with random variables.
<4> Example. Let {Xn : n ∈ N0} be a martingale and let be a convexfunction for which each (Xn) is integrable. Then { (Xn) : n ∈ N0} is a submartingale: the required almost sure inequality, P ( (Xn) | Fn−1) ≥ (Xn−1), is a direct application of the conditional expectation form of Jensen’s inequality. The companion result for submartingales is: if the convex function is increasing, if {Xn} is a submartingale, and if each (Xn) is integrable, then { (Xn) : n ∈ N0} is a submartingale, because
P ( (Xn) | Fn−1) ≥ (P(Xn | Fn−1) ≥ (Xn−1). a.s. a.s. 6.1 What are they? 141
Two good examples to remember: if {Xn} is a martingale and each Xn is square { 2} { } { +} integrable then Xn is a submartingale; and if Xn is a submartingale then Xn is also a submartingale.
<5> Example. Let {Xn : n ∈ N0} be a martingale written as a sum of increments, Xn := X0 + ξ1 + ...+ ξn. Not surprisingly, the {ξi } are called martingale differences. ξ P(ξ | F ) = ∈ N+ AdvancedEach n is integrable and n n−1 0 for n . a.s. 0 A new martingale can be built by weighting the increments using predictable functions {Hn : n ∈ N}, meaning that each Hn should be an Fn−1-measurable random variable, a more stringent requirement than adaptedness. The value of the weight becomes known before time n; it is known before it gets applied to the next increment.
If we assume that each Hnξn is integrable then the sequence
Yn := X0 + H1ξ1 + ...+ Hnξn is both integrable and adapted. It is a martingale, because
PHi ξi F = P(Xi − Xi−1)(Hi F), which equals zero by a simple generalization of (MG). (Use Dominated Conver- gence to accommodate integrable Z.) If {Xn : n ∈ N0} is just a submartingale, a similar argument shows that the new sequence is also a submartingale, provided the predictable weights are also nonnegative.
<6> Example. Suppose X is an integrable random variable and {Ft : t ∈ T } is a filtration. Define Xt := P(X | Ft ). Then the family {Xt : t ∈ T } is a martingale with respect to the filtration, becauseProb for s < t, P(Xt F) = P(XF) if F ∈ Ft
= P(Xs F) if F ∈ Fs (We have just reproved the formula for conditioning on nested sigma-fields.)
<7> Example. Every sequence {Xn : n ∈ N0} of integrable random variables adapted to a filtration {Fn : n ∈ N0} can be broken into a sum of a martingale plus a sequence of accumulated conditional expectations. To establish this fact, consider the increments ξn := Xn − Xn−1. Each ξn is integrable, but it need not have zero conditional F expectation given n−1, the property that characterizes2018 martingale differences. Extraction of the martingale component is merely a matter of recentering the increments to zero conditional expectations. Define ηn := P(ξn | Fn−1) and
Mn := X0 + (ξ1 − η1) + ...+ (ξn − ηn)
An := η1 + ...+ ηn.
Then Xn = Mn + An, with {Mn} a martingale and {An} a predictable sequence.
Often {An} will have some nice behavior, perhaps due to the smoothing involved in the taking of a conditional expectation, or perhaps due to some other special property of the {Xn}. For example, if {Xn} were a submartingale the ηi would all be nonnegative (almost surely) and {An} would be an increasing sequence of random variables. Such properties are useful for establishing limit theory and inequalities—see Example <18> for an illustration of the general method. 142 Chapter 6: Martingale et al.
Remark. The representation of a submartingale as a martingale plus an increasing, predictable process is sometimes called the Doob decomposition. The corresponding representation for continuous time, which is exceedingly difficult to Advancedestablish, is called the Doob-Meyer decomposition. 2. Stopping times
The martingale property requires equalities PXs F = PXt F, for s < t and F ∈ Fs . Much of the power of the theory comes from the fact that analogous inequalities hold when s and t are replaced by certain types of random times. To make sense of the broader assertion, we need to define objects such as Fτ and Xτ for random times τ. <8> Definition. A random variable τ taking values in T := T ∪ {∞} is called a stopping time for a filtration {Ft : t ∈ T } if {τ ≤ t}∈Ft for each t in T .
In discrete time, with T = N0, the defining property is equivalent to
{τ = n}∈Fn for each n in N0, {τ ≤ }= {τ = } {τ = }={τ ≤ }{τ ≤ − }c because n i≤n i and n n n 1 .
<9> Example. Let {Xn : n ∈ N0} be adapted to a filtration {Fn : n ∈ N0}, and let B be a Borel subset of R. Define τ(ω) := inf{n : Xn(ω) ∈ B}, with the interpretation that the infimum of the empty set equals +∞. That is, τ(ω) =+∞if Xn(ω)∈ / B for all n. The extended-real-valued random variable τ is a stopping time because {τ ≤ n}= {X ∈ B}∈F for n ∈ N . Probi≤n i n 0 It is called the first hitting time of the set B. Do you see why it is convenient to allow stopping times to take the value +∞?
If Fi corresponds to the information available up to time i, how should we define a sigma-field Fτ to correspond to information available up to a random time τ? Intuitively, on the part of where τ = i the sets in the sigma-field Fτ should be the same as the sets in the sigma-field Fi . That is, we could hope that
{F{τ = i} : F ∈ Fτ }={F{τ = i} : F ∈ Fi } for each i. These equalities would be suitable as a definition of Fτ in2018 discrete time; we could define Fτ to consist of all those F in F for which
<10> F{τ = i}∈Fi for all i ∈ N0. For continuous time such a definition could become vacuous if all the sets {τ = t} were negligible, as sometimes happens. Instead, it is better to work with a definition that makes sense in both discrete and continuous time, and which is equivalent to <10> in discrete time.
<11> Definition. Let τ be a stopping time for a filtration {Ft : t ∈ T }, taking values in T := T ∪ {∞}. If the sigma-field F∞ is not already defined, take it to be σ (∪t∈T Ft ). The pre-τ sigma-field Fτ is defined to consist of all F for which F{τ ≤ t}∈Ft for all t ∈ T . 6.2 Stopping times 143
The class Fτ would not be a sigma-field if τ were not a stopping time: the property ∈ Fτ requires {τ ≤ t}∈Ft for all t.
Remark. Notice that Fτ ⊆ F∞ (because F{τ ≤∞}∈F∞ if F ∈ Fτ ), with equality when τ ≡∞. More generally, if τ takes a constant value, t, then Fτ = Ft . It would be very awkward if we had to distinguish between random variables taking Advancedconstant values and the constants themselves. <12> Example. The stopping time τ is measurable with respect to Fτ , because, for each α ∈ R+ and t ∈ T ,
{τ ≤ α}{τ ≤ t}={τ ≤ α ∧ t}∈Fα∧t ⊆ Ft . + That is, {τ ≤ α}∈Fτ for all α ∈ R , from which the Fτ -measurability follows by the usual generating class argument. It would be counterintuitive if the information corresponding to the sigma-field Fτ did not include the value taken by τ itself. <13> Example. Suppose σ and τ are both stopping times, for which σ ≤ τ always. Then Fσ ⊆ Fτ because F{τ ≤ t}= F{σ ≤ t} {τ ≤ t} for all t ∈ T ,
and both sets on the right-hand side are Ft -measurable if F ∈ Fσ .
<14> Exercise. Show that a random variable Z is Fτ -measurable if and only if Z{τ ≤ t} is Ft -measurable for all t in T . Solution: For necessity, write Z as a pointwise limit of Fτ -measurable simple functions Zn, then note that each Zn{τ ≤ t} is a linear combination of indicator functions of Ft -measurable sets. For sufficiency, it is enoughProb to show that {Z >α }∈Fτ and {Z < −α}∈Fτ , for each α ∈ R+. For the first requirement, note that {Z >α}{τ ≤ t}={Z{τ ≤ t} >α}, which belongs to Ft for each t, because Z{τ ≤ t} is assumed to be Ft -measurable. Thus {Z >α}∈Fτ . Argue similarly for the other requirement.
The definition of Xτ is almost straightforward. Given random variables {Xt : t ∈ T } and a stopping time τ, we should define Xτ as the function taking the value Xt (ω) when τ(ω) = t.Ifτ takes only values in T there is no problem. However, a slight embarrassment would occur when τ(ω) =∞if ∞ were not a point of T , for then X∞(ω) need not be defined. In the happy situation when there is a natural candidate for X∞, the embarrassment2018 disappears with little fuss; otherwise it is wiser to avoid the difficulty altogether by working only with the random variable Xτ {τ<∞}, which takes the value zero when τ is infinite. Measurability of Xτ {τ<∞}, even with respect to the sigma-field F, requires further assumptions about the {Xt } for continuous time. For discrete time the task is much easier. For example, if {Xn : n ∈ N0} is adapated to a filtration {Fn : n ∈ N0}, and τ is a stopping time for that filtration, then Xτ {τ<∞}{τ ≤ t}= Xi {i = τ ≤ t}. i∈N0
For i > t the ith summand is zero; for i ≤ t it equals Xi {τ = i}, which is Fi -measurable. The Fτ -measurability of Xτ {τ<∞} then follows by Exercise <14> 144 Chapter 6: Martingale et al.
The next Exercise illustrates the use of stopping times and the σ -fields they define. The discussion does not directly involve martingales, but they are lurking in the background. <15> Exercise. A deck of 52 cards (26 reds, 26 blacks) is dealt out one card at a time, face up. Once, and only once, you will be allowed to predict that the next card will Advancedbe red. What strategy will maximize your probability of predicting correctly? Solution: Write Ri for the event {ith card is red}. Assume all permutations of the deck are equally likely initially. Write Fn for the σ -field generated by R1,...,Rn. A strategy corresponds to a stopping time τ that takes values in {0, 1,...,51}: you should try to maximize PRτ+1. Surprisingly, PRτ+1 = 1/2 for all such stopping rules. The intuitive explanation is that you should always be indifferent, given that you have observed cards 1, 2,...,τ, between choosing card τ + 1 or choosing card 52. That is, it should be true that P(Rτ+1 | Fτ ) = P(R52 | Fτ ) almost surely; or, equivalently, that PRτ+1 F = PR52 F for all F ∈ Fτ ; or, equivalently, that
PRk+1 F{τ = k}=PR52 F{τ = k} for all F ∈ Fτ and k = 0, 1,...,51.
We could then deduce that PRτ+1 = PR52 = 1/2. Of course, we only need the case F = , but I’ll carry along the general F as an illustration of technique while proving the assertion in the last display. By definition of Fτ , c F{τ = k}=F{τ ≤ k − 1} {τ ≤ k}∈Fk .
That is, F{τ = k} must be of the form {(R1,...,Rk ) ∈ B} for some Borel k subset B of R . Symmetry of the joint distribution of R1,...,R52 implies that the random vector (R ,...,R , RProb+ ) has the same distribution as the random vector 1 k k 1 (R1,...,Rk , R52), whence
PRk+1{(R1,...,Rk ) ∈ B}=PR52{(R1,...,Rk ) ∈ B}. See Section 8 for more about symmetry and martingale properties.
The hidden martingale in the previous Exercise is Xn, the proportion of red cards remaining in the deck after n cards have been dealt. You could check the martingale property by first verifying that P(Rn+1 | Fn) = Xn (an equality that is obvious if one thinks in terms of conditional distributions), then calculating ( − − )P ( | F ) = P ( − ) − | F = 2018( − ) − P( | F ). 52 n 1 Xn+1 n 52 n Xn Rn+1 n 52 n Xn Rn+1 n The problem then asks for the stopping time to maximaize 51 PRτ+ = P (R + {τ = i}) 1 i=0 i 1 = 51 P ( {τ = }) {τ = }∈F i=0 Xi i because i i = PXτ .
The martingale property tells us that PX0 = PXi for i = 1,...,51. If we could extend the equality to random i, by showing that PXτ = PX0, then the surprising conclusion from the Exercise would follow. Clearly it would be useful if we could always assert that PXσ = PXτ for every martingale, and every pair of stopping times. Unfortunately (Or should I say 6.2 Stopping times 145
fortunately?) the result is not true without some extra assumptions. The simplest σ and most useful case concerns finite time sets. If takes values in a finite set T , and | |≤ | | if each Xt is integrable, then Xσ t∈T Xt , which eliminates any integrability difficulties. For an infinite indexset, the integrability of Xσ is not automatic. <16> Stopping Time Lemma. Suppose σ and τ are stopping times for a filtration {F ∈ } Advancedt : t T , with T finite. Suppose both stopping times take only values in T . Let F be a set in Fσ for which σ(ω) ≤ τ(ω) when ω ∈ F.If{Xt : t ∈ T } is a submartingale, then PXσ F ≤ PXτ F. For supermartingales, the inequality is reversed. For martingales, the inequality becomes an equality. Proof. Consider only the submartingale case. For simplicity of notation, suppose T ={0, 1,...,N}. Write each Xn as a sum of increments, Xn = X0 + ξ1 + ...+ ξn. The inequality σ ≤ τ,onF, lets us write Xτ F − Xσ F = X0 F + {i ≤ τ}Fξi − X0 F + {i ≤ σ }Fξi 1≤i≤N 1≤i≤N = {σ