
<p>Conditioning and Markov properties </p><p>Anders Rønn-Nielsen <br>Ernst Hansen </p><p>Department of Mathematical Sciences <br>University of Copenhagen <br>Department of Mathematical Sciences University of Copenhagen Universitetsparken 5 DK-2100 Copenhagen </p><p>Copyright 2014 Anders Rønn-Nielsen & Ernst Hansen ISBN 978-87-7078-980-6 </p><p>Contents </p><p></p><ul style="display: flex;"><li style="flex:1">Preface </li><li style="flex:1">v</li></ul><p></p><ul style="display: flex;"><li style="flex:1">1</li><li style="flex:1">Conditional distributions </li><li style="flex:1">1</li></ul><p></p><p>136<br>1.1 Markov kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Integration of Markov kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Properties for the integration measure . . . . . . . . . . . . . . . . . . . . . . 1.4 Conditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Existence of conditional distributions . . . . . . . . . . . . . . . . . . . . . . . 16 1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 </p><p>23</p><ul style="display: flex;"><li style="flex:1">Conditional distributions: Transformations and moments </li><li style="flex:1">27 </li></ul><p></p><p>2.1 Transformations of conditional distributions . . . . . . . . . . . . . . . . . . . 27 2.2 Conditional moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 </p><p></p><ul style="display: flex;"><li style="flex:1">Conditional independence </li><li style="flex:1">51 </li></ul><p></p><p>3.1 Conditional probabilities given a σ–algebra . . . . . . . . . . . . . . . . . . . 52 3.2 Conditionally independent events . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3 Conditionally independent σ-algebras . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 Shifting information around . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5 Conditionally independent random variables . . . . . . . . . . . . . . . . . . . 61 3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 </p><p></p><ul style="display: flex;"><li style="flex:1">4</li><li style="flex:1">Markov chains </li><li style="flex:1">71 </li></ul><p></p><p>4.1 The fundamental Markov property . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 The strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3 Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4 An integration formula for a homogeneous Markov chain . . . . . . . . . . . . 99 4.5 The Chapmann-Kolmogorov equations . . . . . . . . . . . . . . . . . . . . . . 100 </p><p>iv 5<br>CONTENTS </p><p>4.6 Stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 </p><p></p><ul style="display: flex;"><li style="flex:1">Ergodic theory for Markov chains on general state spaces </li><li style="flex:1">111 </li></ul><p></p><p>5.1 Convergence of transition probabilities . . . . . . . . . . . . . . . . . . . . . . 113 5.2 Transition probabilities with densities . . . . . . . . . . . . . . . . . . . . . . 115 5.3 Asymptotic stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.4 Minorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.5 The drift criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 </p><p></p><ul style="display: flex;"><li style="flex:1">6</li><li style="flex:1">An introduction to Bayesian networks </li><li style="flex:1">141 </li></ul><p></p><p>6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2 Directed graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.3 Moral graphs and separation . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.4 Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.5 Global Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 </p><p></p><ul style="display: flex;"><li style="flex:1">A Supplementary material </li><li style="flex:1">155 </li></ul><p></p><p>A.1 Measurable spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 A.2 Random variables and conditional expectations . . . . . . . . . . . . . . . . . 157 </p><p></p><ul style="display: flex;"><li style="flex:1">B Hints for exercises </li><li style="flex:1">161 </li></ul><p></p><p>B.1 Hints for chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 B.2 Hints for chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.3 Hints for chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 B.4 Hints for chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 B.5 Hints for chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 </p><p>Preface </p><p>The present lecture notes are intended for the course ”Beting”. The purpose is to give a detailed probabilistic introduction to conditional distributions and how this concept is used to define and describe Markov chains and Bayesian networks. </p><p>The chapters 1–4 are mainly based on different sets of lecture notes written by Ernst Hansen but are adapted to suit students with the knowledge obtained in the courses MI, VidSand1, and VidSand2. Parts of these chapters also contain material inspired by lecture notes written by Martin Jacobsen and Søren Tolver Jensen. Chapter 5 is an adapted version of a set of lecture notes written by Søren Tolver Jensen and Martin Jacobsen. These notes themselves were inspired by earlier notes by Søren Feodor Nielsen. Chapter 6 contains some standard results on Bayesian networks - though both formulations and proofs are formulated in the rather general framework from chapter 3 and 4. </p><p>There are exercises in the end of each chapter and hints to selected exercises in the end of the notes. Some exercises are adapted versions of exercises and exam exercises from previous lecture notes. Others are inspired by examples and results collected from a large number of monographs on Markov chains, stochastic simulation, and probability theory in general. </p><p>I am grateful to both students and the teaching assistants from the last two years, Ketil Biering Tvermosegaard and Daniele Cappelletti, who have contributed to the notes by identifying mistakes and suggesting improvements. </p><p>Anders Rønn-Nielsen København, 2014 </p><p></p><ul style="display: flex;"><li style="flex:1">vi </li><li style="flex:1">CONTENTS </li></ul><p></p><p>Chapter 1 </p><p>Conditional distributions </p><p>Let (X, E) and (Y, K) be two measurable spaces. In this chapter we shall discuss the relation between measures on the product space (X × Y, E ⊗ K) and measures on the two marginal spaces (X, E) and (Y, K). More precisely we will see that measures on the product space can be constructed from measures on the two marginal spaces. A particularly simple example is a product measure µ ⊗ ν where µ and ν are measures on (X, E) and (Y, K) respectively. </p><p>The two factors X and Y will not enter the setup symmetrically in the more general construction. </p><p>1.1 Markov kernels </p><p>Definition 1.1.1. A (X, E)–Markov Kernel on (Y, K) is a family of probability measures (P<sub style="top: 0.1246em;">x</sub>)<sub style="top: 0.1246em;">x∈X </sub>on (Y, K) indexed by points in X such that the map </p><p>x → P<sub style="top: 0.1246em;">x</sub>(B) </p><p>is E − B–measurable for every fixed B ∈ K. Theorem 1.1.2. Let (X, E) and (Y, K) be measurable spaces, let ν be a σ–finite measure on </p><p>+</p><p>(Y, K), and let f ∈ M (X × Y, E ⊗ K) have the property that </p><p>Z</p><p>f(x, y) dν(y) = 1 </p><p>for all x ∈ X . </p><p></p><ul style="display: flex;"><li style="flex:1">2</li><li style="flex:1">Conditional distributions </li></ul><p></p><p>Then (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>given by </p><p>Z</p><p>P<sub style="top: 0.1245em;">x</sub>(B) = </p><p>f(x, y) dν(y) for all B ∈ K, x ∈ X </p><p>B</p><p>is a (X, E)–Markov Kernel on (Y, K). </p><p>Proof. For each fixed set B ∈ K we need to argue that </p><p>Z</p><p>x → <br>1<sub style="top: 0.1246em;">X×B</sub>(x, y)f(x, y) dν(y) </p><p>is an E–measurable function. That is a direct result of EH Theorem 8.7 applied to the function 1<sub style="top: 0.1245em;">X×B</sub>f. </p><p>Lemma 1.1.3. If (Y, K) has an intersection–stable generating system D, then (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>is a (X, E)–Markov Kernel on (Y, K) if only </p><p>x → P<sub style="top: 0.1245em;">x</sub>(D) </p><p>is E − B-measurable for all fixed D ∈ D. Proof. Define </p><p>H = {F ∈ K : x → P<sub style="top: 0.1245em;">x</sub>(F) is E − B–measurable} <br>It is easily checked that H is a Dynkin Class. Since D ⊆ H, we have H = K. </p><p>Lemma 1.1.4. Let (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>be a (X, E)–Markov kernel on (Y, K). For each G ∈ E ⊗ K the map </p><p>x → P<sub style="top: 0.1245em;">x</sub>(G<sup style="top: -0.3428em;">x</sup>) </p><p>is E − B–measurable. Proof. Let </p><p>H = {G ∈ E ⊗ K : x → P<sub style="top: 0.1246em;">x</sub>(G<sup style="top: -0.3428em;">x</sup>) is E − B–measurable} and consider a product set A × B ∈ E ⊗ K. Then </p><p>(</p><p>∅</p><p>if x ∈/ A </p><p>(A × B)<sup style="top: -0.3428em;">x </sup>= </p><p></p><ul style="display: flex;"><li style="flex:1">B</li><li style="flex:1">if x ∈ A </li></ul><p></p><p>such that </p><p>(</p><p>0</p><p>if x ∈/ A </p><p></p><ul style="display: flex;"><li style="flex:1">P<sub style="top: 0.1246em;">x</sub>((A × B)<sup style="top: -0.3428em;">x</sup>) = </li><li style="flex:1">= 1<sub style="top: 0.1246em;">A</sub>(x) · P<sub style="top: 0.1246em;">x</sub>(B) </li></ul><p>P<sub style="top: 0.1245em;">x</sub>(B) if x ∈ A </p><p></p><ul style="display: flex;"><li style="flex:1">1.2 Integration of Markov kernels </li><li style="flex:1">3</li></ul><p></p><p>This is a product of two E−B–measurable functions. Hence it is E−B–measurable. Thereby we conclude that H contains all product sets, and since this is a intersection stable generating system for E ⊗ K, we have H = E ⊗ K, if we can show that H is a Dynkin class: </p><p>We already have that X × Y ∈ H – it is a product set! Assume that G<sub style="top: 0.1246em;">1 </sub>⊆ G<sub style="top: 0.1246em;">2 </sub>are two sets in H. Then obviously also G<sup style="top: -0.3012em;">x</sup><sub style="top: 0.2061em;">1 </sub>⊆ G<sub style="top: 0.2061em;">2</sub><sup style="top: -0.3012em;">x </sup>for all x ∈ X, and </p><p>(G<sub style="top: 0.1245em;">2 </sub>\ G<sub style="top: 0.1245em;">1</sub>)<sup style="top: -0.3428em;">x </sup>= G<sub style="top: 0.2053em;">2</sub><sup style="top: -0.3428em;">x </sup>\ G<sup style="top: -0.3428em;">x</sup><sub style="top: 0.2053em;">1 </sub>. </p><p>Then <br>P<sub style="top: 0.1245em;">x</sub>((G<sub style="top: 0.1245em;">2 </sub>\ G<sub style="top: 0.1245em;">1</sub>)<sup style="top: -0.3428em;">x</sup>) = P<sub style="top: 0.1245em;">x</sub>(G<sub style="top: 0.2053em;">2</sub><sup style="top: -0.3428em;">x</sup>) − P<sub style="top: 0.1245em;">x</sub>(G<sup style="top: -0.3428em;">x</sup><sub style="top: 0.2053em;">1 </sub>) </p><p>which is a difference between two measurable functions. Hence G<sub style="top: 0.1246em;">2 </sub>\ G<sub style="top: 0.1246em;">1 </sub>∈ H. Finally, assume that G<sub style="top: 0.1245em;">1 </sub>⊆ G<sub style="top: 0.1245em;">2 </sub>⊆ · · · is an increasing sequence of H–sets. Similarly to above </p><p>we have G<sup style="top: -0.3013em;">x</sup><sub style="top: 0.2061em;">1 </sub>⊆ G<sup style="top: -0.3013em;">x</sup><sub style="top: 0.2061em;">2 </sub>⊆ · · · and </p><p></p><ul style="display: flex;"><li style="flex:1"> </li><li style="flex:1">!</li></ul><p></p><p>x</p><p></p><ul style="display: flex;"><li style="flex:1">∞</li><li style="flex:1">∞</li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">[</li><li style="flex:1">[</li></ul><p></p><p>G<sub style="top: 0.1245em;">n </sub></p><p>=</p><p>G<sup style="top: -0.3428em;">x</sup><sub style="top: 0.2052em;">n </sub>. </p><p></p><ul style="display: flex;"><li style="flex:1">n=1 </li><li style="flex:1">n=1 </li></ul><p></p><p>Then </p><p> </p><p>! ! </p><p></p><ul style="display: flex;"><li style="flex:1"> </li><li style="flex:1">!</li></ul><p></p><p>x</p><p></p><ul style="display: flex;"><li style="flex:1">∞</li><li style="flex:1">∞</li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">[</li><li style="flex:1">[</li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">P<sub style="top: 0.1245em;">x </sub></li><li style="flex:1">G<sub style="top: 0.1245em;">n </sub></li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">= P<sub style="top: 0.1245em;">x </sub></li><li style="flex:1">G<sup style="top: -0.3428em;">x</sup><sub style="top: 0.2052em;">n </sub>= lim P<sub style="top: 0.1245em;">x</sub>(G<sup style="top: -0.3428em;">x</sup><sub style="top: 0.2052em;">n</sub>) </li></ul><p></p><p>n→∞ </p><p></p><ul style="display: flex;"><li style="flex:1">n=1 </li><li style="flex:1">n=1 </li></ul><p></p><p>This limit is E − B–measurable, since each of the functions x → P<sub style="top: 0.1245em;">x</sub>(G<sup style="top: -0.3012em;">x</sup><sub style="top: 0.2053em;">n</sub>) are measurable. </p><p>S<sub style="top: 0.2053em;">∞ </sub></p><p>Then <sub style="top: 0.2491em;">n=1 </sub>G<sub style="top: 0.1246em;">n </sub>∈ H. </p><p>1.2 Integration of Markov kernels </p><p>Theorem 1.2.1. Let µ be a probability measure on (X, E) and let (P<sub style="top: 0.1246em;">x</sub>)<sub style="top: 0.1246em;">x∈X </sub>be a (X, E)– Markov kernel on (Y, K). There exists a uniquely determined probability measure λ on (X × Y, E ⊗ K) satisfying </p><p>Z</p><p></p><ul style="display: flex;"><li style="flex:1">λ(A × B) = </li><li style="flex:1">P<sub style="top: 0.1245em;">x</sub>(B) dµ(x) </li></ul><p></p><p>A</p><p>for all A ∈ E and B ∈ K. </p><p>The probability measure λ constructed in Theorem 1.2.1 is called the integration of (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>with respect to µ. The interpretation is that λ describes an experiment on X × Y that is performed in two steps: The first step is drawing x ∈ X. The second step is drawing y ∈ Y according to a probability measure that is determined by x. </p><p></p><ul style="display: flex;"><li style="flex:1">4</li><li style="flex:1">Conditional distributions </li></ul><p></p><p>Proof. The uniqueness follows, since λ is determined on all product sets and these form an intersection stable generating system for E ⊗ K. </p><p>In order to prove the existence, we define </p><p>Z</p><p>λ(G) = P<sub style="top: 0.1245em;">x</sub>(G<sup style="top: -0.3427em;">x</sup>) dµ(x) <br>For each G ∈ E⊗K the integrand is measurable according to Lemma 1.1.4. It is furthermore non–negative, such that λ(G) is well–defined with values in [0, ∞]. Now let G<sub style="top: 0.1246em;">1</sub>, G<sub style="top: 0.1246em;">2</sub>, . . . be a sequence of disjoint sets in E ⊗ K. Then for each x ∈ X the sets G<sup style="top: -0.3012em;">x</sup><sub style="top: 0.2061em;">1 </sub>, G<sup style="top: -0.3012em;">x</sup><sub style="top: 0.2061em;">2 </sub>, . . . are disjoint as well. Hence </p><p></p><ul style="display: flex;"><li style="flex:1">Z</li><li style="flex:1">Z</li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">∞</li><li style="flex:1">∞</li><li style="flex:1">∞</li><li style="flex:1">∞</li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">ꢀ</li><li style="flex:1">ꢁ</li><li style="flex:1">ꢀꢀ </li></ul><p></p><p>ꢁ ꢁ </p><p></p><ul style="display: flex;"><li style="flex:1">[</li><li style="flex:1">[</li><li style="flex:1">X</li><li style="flex:1">X</li></ul><p></p><p>x</p><p>λ</p><p>G<sub style="top: 0.1245em;">n </sub></p><p>=</p><p></p><ul style="display: flex;"><li style="flex:1">P<sub style="top: 0.1245em;">x </sub></li><li style="flex:1">G<sub style="top: 0.1245em;">n </sub></li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">dµ(x) = </li><li style="flex:1">P<sub style="top: 0.1245em;">x</sub>(G<sup style="top: -0.3428em;">x</sup><sub style="top: 0.2052em;">n</sub>) dµ(x) = </li><li style="flex:1">λ(G<sub style="top: 0.1245em;">n</sub>) </li></ul><p></p><p></p><ul style="display: flex;"><li style="flex:1">n=1 </li><li style="flex:1">n=1 </li><li style="flex:1">n=1 </li><li style="flex:1">n=1 </li></ul><p></p><p>In the second equality we have used that each P<sub style="top: 0.1245em;">x </sub>is a measure, and in the third equality we have used monotone convergence to interchange integration and summation. From this we have that λ is a measure. And since </p><p></p><ul style="display: flex;"><li style="flex:1">Z</li><li style="flex:1">Z</li><li style="flex:1">Z</li></ul><p></p><p>λ(X × Y) = P<sub style="top: 0.1246em;">x</sub>((X × Y)<sup style="top: -0.3428em;">x</sup>) dµ(x) = P<sub style="top: 0.1246em;">x</sub>(Y) dµ(x) = 1 dµ(x) = 1 we obtain, than λ is actually a probability measure. Finally, it follows that </p><p></p><ul style="display: flex;"><li style="flex:1">Z</li><li style="flex:1">Z</li><li style="flex:1">Z</li></ul><p></p><p>λ(A × B) = P<sub style="top: 0.1245em;">x</sub>((A × B)<sup style="top: -0.3428em;">x</sup>) dµ(x) = 1<sub style="top: 0.1245em;">A</sub>(x)P<sub style="top: 0.1245em;">x</sub>(B) dµ(x) = <br>P<sub style="top: 0.1245em;">x</sub>(B) dµ(x) </p><p>A</p><p>for all A ∈ E and B ∈ K. </p><p>Corollary 1.2.2. Let µ be a probability measure on (X, E) and let (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>be a (X, E)– Markov kernel on (Y, K). Let λ be the integration of (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>with respect to µ. Then λ satisfies </p><p></p><ul style="display: flex;"><li style="flex:1">λ(A × Y) = </li><li style="flex:1">µ(A) </li></ul><p></p><p>for all A ∈ E </p><p>Z</p><p>λ(X × B) = P<sub style="top: 0.1246em;">x</sub>(B) dµ(x) </p><p>for all B ∈ K </p><p>Proof. The second statement is obvious. For the first result just note that P<sub style="top: 0.1245em;">x</sub>(Y) = 1 for all </p><p>x ∈ X. </p><p>The probability measure on (Y, K) defined by λ(X ×B) is called the mixture of the Markov kernel with respect to µ. Example 1.2.3. Let µ be a probability measure on (X, E) and let ν be a probability measure </p><p></p><ul style="display: flex;"><li style="flex:1">1.2 Integration of Markov kernels </li><li style="flex:1">5</li></ul><p></p><p>on (Y, K). Define P<sub style="top: 0.1245em;">x </sub>= ν for all x ∈ X. Then, trivially, (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>is a X–Markov kernel on Y. Let λ be the integration of this kernel with respect to λ. Then for all A ∈ E and B ∈ K </p><p>Z</p><p></p><ul style="display: flex;"><li style="flex:1">λ(A × B) = </li><li style="flex:1">ν(B) dµ(x) = µ(A) · ν(B) . </li></ul><p></p><p>A</p><p>The only measure satisfying this property is the product measure µ ⊗ ν, so λ = µ ⊗ ν. Hence a product measure is a particularly simple example of a measure constructed by integrating a Markov kernel. </p><p>◦</p><p>Example 1.2.4. Let µ be the Poisson distribution with parameter λ. For each x ∈ N<sub style="top: 0.1246em;">0 </sub>we define P<sub style="top: 0.1245em;">x </sub>to be the binomial distribution with parameters (x, p). Then it is seen that (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈N </sub>is a N<sub style="top: 0.1245em;">0</sub>–Markov kernel on N<sub style="top: 0.1245em;">0</sub>. </p><p>0</p><p>Let ξ be the mixture of (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈N </sub>with respect to µ. This must be a probability measure on </p><p>0</p><p>N<sub style="top: 0.1245em;">0 </sub>and is thereby given by the probability function q. For n ∈ N<sub style="top: 0.1245em;">0 </sub>we obtain </p><p>ꢂ ꢃ </p><p>∞</p><p>X</p><p>k</p><p>λ<sup style="top: -0.3013em;">k </sup>p<sup style="top: -0.3428em;">n</sup>(1 − p)<sup style="top: -0.3428em;">k−n </sup>e<sup style="top: -0.3428em;">−λ </sup></p><p>k! q(n) = </p><p>n</p><p>k=n </p><p></p><ul style="display: flex;"><li style="flex:1">ꢄ</li><li style="flex:1">ꢅ</li></ul><p></p><p>k−n </p><p>∞</p><p>(λp)<sup style="top: -0.3013em;">n </sup></p><p>X</p><p>(1 − p)λ <br>=</p><p>e<sup style="top: -0.3428em;">−λ </sup></p><p></p><ul style="display: flex;"><li style="flex:1">n! </li><li style="flex:1">(k − n)! </li></ul><p></p><p>k=n </p><p>(λp)<sup style="top: -0.3013em;">n </sup><br>=</p><p>=</p><p>e</p><p><sup style="top: -0.3428em;">−λ</sup>e<sup style="top: -0.3428em;">(1−p)λ </sup></p><p>n! <br>(λp)<sup style="top: -0.3012em;">n </sup></p><p>e<sup style="top: -0.3428em;">−λp </sup></p><p>n! <br>Hence the mixture ξ is seen to be the Poisson distribution with parameter λp. </p><p>◦</p><p>Theorem 1.2.5 (Uniqueness of integration). Suppose that (Y, K) has a countable generating system that is intersection stable. Let µ and µ˜ be two probability measures on (X, E) and </p><p>˜</p><p>assume that (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>and (P<sub style="top: 0.1245em;">x</sub>)<sub style="top: 0.1245em;">x∈X </sub>are two (X, E)–Markov kernels on (Y, K). Let λ be the </p><p></p><ul style="display: flex;"><li style="flex:1">˜</li><li style="flex:1">˜</li></ul><p></p><p>integration of (P<sub style="top: 0.1246em;">x</sub>)<sub style="top: 0.1246em;">x∈X </sub>with respect to µ, and let λ be the integration of (P<sub style="top: 0.1246em;">x</sub>)<sub style="top: 0.1246em;">x∈X </sub>with respect to µ˜. Define </p><p>˜</p><p>E<sub style="top: 0.1246em;">0 </sub>= {x ∈ X : P<sub style="top: 0.1246em;">x </sub>= P<sub style="top: 0.1246em;">x</sub>} </p><p>˜</p><p>Then λ = λ if and only if µ = µ˜ and µ(E<sub style="top: 0.1246em;">0</sub>) = 1. </p><p>Proof. Let (B<sub style="top: 0.1246em;">n</sub>)<sub style="top: 0.1246em;">n∈N </sub>be a countable generating system for (Y, K). Then </p><p>∞</p><p>\</p><p>˜</p><p>E<sub style="top: 0.1246em;">0 </sub>= </p><p>{x ∈ X : P<sub style="top: 0.1246em;">x</sub>(B<sub style="top: 0.1246em;">n</sub>) = P<sub style="top: 0.1246em;">x</sub>(B<sub style="top: 0.1246em;">n</sub>)} </p><p>n=1 </p><p>from which we can conclude that E<sub style="top: 0.1246em;">0 </sub>∈ E. </p><p></p><ul style="display: flex;"><li style="flex:1">6</li><li style="flex:1">Conditional distributions </li></ul><p></p><p>Assume that µ = µ˜ and µ(E<sub style="top: 0.1245em;">0</sub>) = 1. Then for all A ∈ E and B ∈ K we have </p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages179 Page
-
File Size-