Transitional Conditional Independence 20 3.1

Transitional Conditional Independence Patrick Forré AI4Science Lab, AMLab Informatics Institute University of Amsterdam The Netherlands We develope the framework of transitional conditional independence. For this we introduce transition probability spaces and transitional random variables. These constructions will generalize, strengthen and unify previous notions of (conditional) random variables and non-stochastic variables, (extended) stochastic conditional independence and some form of functional conditional independence. Transitional conditional independence is asymmetric in general and it will be shown that it satisfies all desired relevance relations in terms of left and right versions of the separoid rules, except symmetry, on standard, analytic and universal measurable spaces. As a preparation we prove a disintegration theorem for transition probabilities, i.e. the existence and essential uniqueness of (regular) conditional Markov kernels, on those spaces. Transitional conditional independence will be able to express classical statistical concepts like sufficiency, adequacy and ancillarity. As an application, we will then show how transitional conditional independence can be used to prove a directed global Markov property for causal graphical models that allow for non-stochastic input variables in strong gen- arXiv:2104.11547v2 [math.ST] 27 Aug 2021 erality. This will then also allow us to show the main rules of causal/do- calculus, relating observational and interventional distributions, in such measure theoretic generality. 2020 MSC: 62A99, 60A05. Keywords: Extended conditional independence, asymmetric separoid axioms, conditional Markov kernels, causality, graphical models, global Markov property. [email protected] 1 Contents 1. Introduction 5 2. Transitional Probability Theory 10 2.1. Transition Probabilities/Markov Kernels . ........ 10 2.2. Constructing Transition Probabilities from Others . ........... 12 2.3. Null Sets w.r.t. Transition Probabilities . ......... 14 2.4. Transition Probability Spaces . .... 14 2.5. TransitionalRandomVariables . 15 2.6. Ordering the Class of Transitional Random Variables . ......... 17 2.7. Disintegration of Transition Probabilities . .......... 18 3. Transitional Conditional Independence 20 3.1. Definition of Transitional Conditional Independence . ........... 20 3.2. Transitional Conditional Independence for Random Variables . 22 3.3. Transitional Conditional Independence for Deterministic Variables . 22 3.4. Separoid Rules for Transitional Conditional Independence......... 23 4. Applications to Statistical Theory 25 4.1. Ancillarity, Sufficiency, Adequacy . ..... 26 4.2. InvariantReductions . 27 4.3. Reparameterizing Transitional Random Variables . ......... 27 4.4. PropensityScore ............................... 28 4.5. AWeakLikelihoodPrinciple. 29 4.6. BayesianStatistics . .. .. 29 5. Graph Theory 30 5.1. Conditional Directed Mixed Graphs (CDMGs) . ..... 31 5.2. Sigma-SeparationinCDMGs. 32 5.3. SeparoidRulesforSigma-Separation . ..... 33 6. Applications to Graphical Models 35 6.1. CausalBayesianNetworks . 35 6.2. Global Markov Property for Causal Bayesian Networks . ........ 36 7. Discussion 38 7.1. Extensions to Universal Measurable Spaces . ....... 38 7.2. Comparison to Other Notions of Extended Conditional Independence . 38 7.3. Conclusion................................... 39 Acknowledgments 40 References 40 2 Supplementary Material 48 A. Symmetric Separoids and Asymmetric Separoids 48 A.1. CompatibilityRelations . 49 A.2. AsymmetricSeparoids . 50 A.3. Join-Semi-Lattices from Asymmetric Separoids . ......... 54 A.4. Relations to Symmetric Separoids . .... 56 B. Measure Theory 63 B.1. Completion and Universal Completion of a Sigma-Algebra ........ 63 B.2. Analytic Subsets of a Measurable Space . ..... 66 B.3. Countably Generated and Countably Separated Sigma-Algebras . 67 B.4. Standard, Analytic and Universal Measurable Spaces . ......... 71 B.5. Perfect Measures and Perfect Measurable Spaces . ........ 72 B.6. The Space of Probability Measures . 73 B.7. Measurable Selection Theorem . 80 B.8. Measurable Extension Theorems . 81 C. Proofs - Disintegration of Transition Probabilities 84 C.1. Definition of Regular Conditional Markov Kernels . ........ 84 C.2. Essential Uniqueness of Regular Conditional Markov Kernels . 84 C.3. Existence of Regular Conditional Markov Kernels . ........ 86 C.4. Consistency of Regular Conditional Markov Kernels . ......... 94 D. Proofs - Join-Semi-Lattice Rules for Transitional Random Variables 96 E. Proofs - Separoid Rules for Transitional Conditional Independence 100 E.1. Core Separoid Rules for Transitional Conditional Independence . 102 E.2. Derived Separoid Rules for Transitional Conditional Independence . 109 F. Proofs - Applications to Statistical Theory 111 G. Proofs - Reparameterization of Transitional Random Variables 116 H. Operations on Graphs 123 I. Proofs - Separoid Rules for Sigma-Separation 125 I.1. Core Separoid Rules for Sigma-Separation . ...... 126 I.2. Further Separoid Rules for Sigma-Separation . ........ 128 I.3. Derived Separoid Rules for Sigma-Separation . ........ 129 I.4. Symmetrized Sigma-Separation . 129 J. Proofs - Global Markov Property 131 K. Causal Models 138 K.1. Causal Bayesian Networks - More General . .... 138 3 K.2. Operations on Causal Bayesian Networks . ..... 140 K.3. Global Markov Property for More General Causal Models . ........ 141 K.4.Causal/Do-Calculus. 142 K.5. Backdoor Covariate Adjustment Formula . ..... 147 L. Comparison to Other Notions of Conditional Independence 148 L.1. Variation Conditional Independence . ...... 149 L.2. Transitional Conditional Independence for Random Variables . 151 L.3. Transitional Conditional Independence for Deterministic Variables . 152 L.4. Equivalent Formulations of Transitional Conditional Independence . 153 L.5. The Extended Conditional Independence . ..... 155 L.6. Symmetric Extended Conditional Independence . ....... 157 L.7. Extended Conditional Independence for Families of Probability Distribu- tions...................................... 157 1. Introduction 1. Introduction Conditional independence nowadays is a widely used concept in statistics, probability theory and machine learning, e.g. see [Bis06, Mur12], especially in the areas of probabilistic graphical models and causality, see [DL93, Lau96, SGS00, Pea09, KF09, PJS17, Daw02, RS02, Ric03, ARS09, CMKR12, ER14, Eva16, Eva18, MMC20, BFPM21, GVP90] and many more. Already in its invention paper, see [Daw79a], a strong motivation for (further) developing conditional independence was the ability to express statistical concepts like sufficiency, adequacy and ancillarity, etc., in terms of conditional independence. For example, an ancillary statistic SpXq w.r.t. model PpX|Θq, see [Fis25,Bas64], is a function S of the data X that has the same probability distribution PpSpXqq under any chosen model parameters Θ “ θ. The goal is then to formalize this equivalently as a (conditional) independence relation: SpXq KKPpX|Θq Θ. This comes with two chal- lenges. First, in the non-Bayesian setting, the parameters Θ of the model PpX|Θq are not considered random variables, and thus the usual stochastic (conditional) independence cannot express such concepts in its vanilla form. Second, the dependence relation between X,SpXq and Θ then becomes asymmetric, with deterministic input variables Θ and stochastic output variables X and SpXq. These two points similarly hold true for the notions of sufficiency, adequacy, etc. So any extension of conditional independence that aims at capturing such concepts equivalently needs to embrace and incorporate the discussed asymmetry. Over time several definitions of extensions of conditional independence have been pro- posed and studied, see [Daw79a,Daw79b,Daw80,Daw98,Daw01a,GR01,CD17a,RERS17, FM20, Fri20], each coming with a different focus, motivation and intended characteris- tic. It has become apparent that an extended definition of conditional independence should satisfy many meaningful properties. Arguably, it should: (i) be able to express classical statistical concepts like sufficiency, adequacy, ancillarity, etc., (ii) embrace and anticipate the inherent asymmetry in the dependency between the stochastic (output) and non-stochastic (input) parts of such variables, (iii) formally relate to other forms of conditional independence, e.g. to stochastic conditional independence, but also to versions of functional or variation conditional independence, (iv) work for large classes of measurable spaces, (v) work for large classes and combinations of random and non- stochastic variables, (vi) work for “conditional” (random) variables (which also need a proper definition first), (vii) work for large classes of (transition) probability distributions, e.g. for non-discrete variables that don’t even have densities or are mixtures of discrete and absolute continuous probability distributions, etc., (viii) satisfy reason- able relevance relations and rules, e.g. as many of the separoid rules, see [Daw01a], as possible, (ix) give rise to meaningful factorizations of the used (transition) probability distributions, (x) lead to global Markov properties for conditional probabilistic (causal) graphical models, (xi) be as simple as possible. All mentioned extensions of conditional independence, see [CD17a, RERS17, FM20, Fri20], lack on some of those points. For instance, the notions of conditional independence from [RERS17, Fri20] are symmetric. They thus cannot equivalently express ancillarity,

Transitional Conditional Independence 20 3.1

Cos513 Lecture 2 Conditional Independence and Factorization

Data-Mining Probabilists Or Experimental Determinists?

Bayesian Networks

(Ω, F,P) Be a Probability Spac

A Bunched Logic for Conditional Independence

Week 2: Causal Inference

Conditional Independence and Its Representations*

Causality, Conditional Independence, and Graphical Separation in Settable Systems

The Hardness of Conditional Independence Testing and the Generalised Covariance Measure

On Mathematical Description of Probabilistic Conditional

STAT 535 Lecture 2 Independence and Conditional Independence C

Hyper and Structural Markov Laws for Graphical Models