Probability Theory II
Total Page:16
File Type:pdf, Size:1020Kb
Probability Theory II Spring 2016 Peter Orbanz Contents Chapter 1. Martingales1 1.1. Martingales indexed by partially ordered sets1 1.2. Martingales from adapted processes4 1.3. Stopping times and optional sampling4 1.4. Tail bounds for martingales7 1.5. Notions of convergence for martingales 10 1.6. Uniform integrability 10 1.7. Convergence of martingales 12 1.8. Application: The 0-1 law of Kolmogorov 17 1.9. Continuous-time martingales 18 1.10. Application: The P´olya urn 20 1.11. Application: The Radon-Nikodym theorem 22 Chapter 2. Measures on nice spaces 27 2.1. Topology review 27 2.2. Metric and metrizable spaces 30 2.3. Regularity of measures 31 2.4. Weak convergence 33 2.5. Polish spaces and Borel spaces 35 2.6. The space of probability measures 39 2.7. Compactness and local compactness 42 2.8. Tightness 44 2.9. Some additional useful facts on Polish spaces 45 Chapter 3. Conditioning 47 3.1. A closer look at the definition 47 3.2. Conditioning on σ-algebras 48 3.3. Conditional distributions given σ-algebras 50 3.4. Working with pairs of random variables 52 3.5. Conditional independence 56 3.6. Application: Sufficient statistics 58 3.7. Conditional densities 60 Chapter 4. Pushing forward and pulling back 61 4.1. Integration with respect to image measures 61 4.2. Pullback measures 63 Chapter 5. Stochastic processes 65 5.1. Constructing spaces of mappings 66 5.2. Extension theorems 69 i ii CONTENTS 5.3. Processes with regular paths 72 5.4. Brownian motion 75 5.5. Markov processes 77 5.6. Processes with stationary and independent increments 79 5.7. Gaussian processes 82 5.8. The Poisson process 84 Bibliography Main references Other references Index CHAPTER 1 Martingales The basic limit theorems of probability, such as the elementary laws of large numbers and central limit theorems, establish that certain averages of independent variables converge to their expected values. A sequence of such averages is a random sequence, but it completely derandomizes in the limit, and this is usually a direct consequence of independence. For more complicated processes|typically, if the variables are stochastically dependent|the limit is not a constant, but is itself random. In general, random limits are very hard to handle mathematically, since we have to precisely quantify the effect of dependencies and control the aggregating randomness as we get further into the sequence. It turns out that it is possible to control dependencies and randomness if a process (Xn)n2N has the simple property E[Xn Xm] =a:s: Xm for all m n ; j ≤ which is called the martingale property. From this innocuous identity, we can derive a number of results which are so powerful, and so widely applicable, that they make martingales one of the fundamental tools of probability theory. Many of these results still hold if we use another index set than N, condition on more general events than the value of Xm, or weaken the equality above to or . We will see all this in detail in this chapter, but for now, I would like you≤ to absorb≥ that martingales provide tools for working with random limits. They are not the only such tools, but there are few others. Notation. We assume throughout that all random variables are defined on a single abstract probability space (Ω; ; P). Any random variable X is a measurable A map X :Ω into some measurable space ( ; x). Elements of Ω are always denoted !.!X Think of any ! as a possible \state ofX theA universe"; a random variable X picks out some limited aspect X(!) of ! (the outcome of a coin flip, say, or the path a stochastic process takes). The law of X is the image measure of P under X, and generically denoted X(P) =: (X). The σ-algebra generated by X −1 L is denoted σ(X) := X x . Keep in mind that these conventions imply the A ⊂ A conditional expectation of a real-valued random variable X :Ω R is a random ! variable E[X ]:Ω R, for any σ-algebra . jC ! C ⊂ A 1.1. Martingales indexed by partially ordered sets The most common types of martingales are processes indexed by values in N (so- called \discrete-time martingales") and in R+ (\continuous-time martingales"). However, martingales can much more generally be defined for index sets that need 1 2 1. MARTINGALES not be totally ordered, and we will later on prove the fundamental martingale convergence results for such general index sets. Partially ordered index sets. Let T be a set. Recall that a binary relation on T is called a partial order if it is (1) reflexive: s s for every s T. (2) antisymmetric: If s t and2t s, then s = t. (3) transitive: If s t and t u, then s u. In general, a partially ordered set may contain elements that are not comparable, i.e. some s; t for which neither s t nor t s (hence \partial"). If all pairs of elements are comparable, the partial order is called a total order. We need partially ordered index sets in various contexts, including martingales and the construction of stochastic processes. We have to be careful, though: Us- ing arbitrary partially ordered sets can lead to all kinds of pathologies. Roughly speaking, the problem is that a partially ordered set can decompose into subsets be- tween which elements cannot be compared at all, as if we were indexing arbitrarily by picking indices from completely unrelated index sets. For instance, a partially ordered set could contain two sequences s1 s2 s3 ::: and t1 t2 t3 ::: of elements which both grow larger and larger in terms of the partial order, but whose elements are completely incomparable between the sequences. To avoid such pathologies, we impose an extra condition: If s; t T, there exists u T such that s u and t u : (1.1) 2 2 A partially ordered set (T; ) which satisfies (1.1) is called a directed set. 1.1 Example. Some directed sets: (a) The set of subsets of an arbitrary set, ordered by inclusion. (b) The set of finite subsets of an infinite set, ordered by inclusion. (c) The set of positive definite n n matrices over R, in the L¨ownerpartial order. × (d) Obviously, any totally ordered set (such as N or R in the standard order). / Just as we can index a family of variables by N and obtain a sequence, we can more generally index it by a directed set; the generalization of a sequence so obtained is called a net. To make this notion precise, let be a set. Recall that, X formally, an (infinite) sequence in is a mapping N , that is, each index X !X s is mapped to the sequence element xi. We usually denote such a sequence as (xi)i2N, or more concisely as (xi). 1.2 Definition. Let (T; ) be a directed set. A net in a set is a function X x : T , and we write xt := x(t) and denote the net as (xt)t2 . / !X T Clearly, the net is a sequence if (T; ) is specifically the totally ordered set (N; ). Just like sequences, nets may converge to a limit: ≤ 1.3 Definition. A net (xt)t2T is said to converge to a point x if, for every open neighborhood U of x, there exists an index t0 T such that 2 xt U whenever t t : (1.2) 2 0 / Nets play an important role in real and functional analysis: To establish certain properties in spaces more general than Rd, we may have to demand that every net satisfying certain properties converges (not just every sequence). Sequences 1.1. MARTINGALES INDEXED BY PARTIALLY ORDERED SETS 3 have stronger properties than nets; for example, in any topological space, the set consisting of all elements of a convergent sequence and its limit is a compact set. The same need not be true for a net. Filtrations and martingales. Let (T; ) be a directed set. A filtration is a family = ( t)t2 of σ-algebras i, indexed by the elements of T, that satisfy F F T F s t = s t : (1.3) )F ⊂ F The index set T = N is often referred to as discrete time; similarly, T = R+ is called continuous time. The filtration property states that each σ-algebra s contains all preceding ones. For a filtration, there is also a uniquely determined,F smallest σ- algebra which contains all σ-algebras in , namely F S 1 := σ s : (1.4) F s2TF Now, let (T; ) be a partially ordered set and = ( s)s2 a filtration. We F F T call a family (Xs)s2 of random variables adapted to if Xs is s-measurable for T F F every s. Clearly, every random sequence or random net (Xs)s2T is adapted to the filtration defined by S t := σ σ(Xs) ; (1.5) F st which is called the canonical filtration of (Xs). An adapted family (Xs; s)s2T is called a martingale if (i) each variable Xs is real-valued and integrableF and (ii) it satisfies Xs =a:s: E[Xt s] whenever s t : (1.6) jF Note (1.6) can be expressed equivalently as Z Z A s : XsdP = XtdP whenever s t : (1.7) 8 2 F A A If (Xs; s)s2T satisfies (1.6) only with equality weakened to (i.e. Xs E(Xt s)), it is calledF a submartingale; for , it is called a supermartingale≤ .≤ jF Intuitively, the martingale property≥ (1.6) says the following: Think of the in- dices s and t as times.