Open Wright-Dissertation.Pdf

The Pennsylvania State University The Graduate School AN ALGEBRAIC PERSPECTIVE ON COMPUTING WITH DATA A Dissertation in Mathematics by William Wright © 2019 William Wright Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2019 The dissertation of William Wright was reviewed and approved∗ by the following: Jason Morton Professor of Mathematics Dissertation Advisor, Chair of Committee Vladimir Itskov Professor of Mathematics Alexei Novikov Professor of Mathematics, Director of Graduate Studies Aleksandra Slavkovic Professor of Statistics ∗Signatures are on file in the Graduate School. ii Abstract Historically, algebraic statistics has focused on the application of techniques from computational commutative algebra, combinatorics, and algebraic geometry to problems in statistics. In this dissertation, we emphasize how sheaves and monads are important tools for thinking about modern statistical computing. First, we explore how probabilistic computing necessitates thinking about random variables as tied to their family of extensions and ultimately reformulate this observation in the language of sheaf theory. We then turn our attention to the relationship between topos theory and relational algebra of databases showing how Codd’s original operations can be seen as constructions inside Set. Next we discuss contextuality, the phenomenon whereby the value of a random variable depends on the other random variables observed simultaneously, and demonstrate how sheaves allow us to lift statistical concepts to contextual measurement scenarios. We then discuss a technique for hypothesis testing based on algebraic invariants whose asymptotic convergence properties do not rely on asymptotitc normality of any estimator as they are defined as energy functionals on the observed data. Finally, we discuss the Giry monad and how its implementation would aid in analysis of data sets with missing data. iii Contents List of Figures x Acknowledgments xi Chapter 1 Introduction 1 1.1 Motivation & Background . 1 1.2 Contributions . 3 1.3 Summary . 5 Chapter 2 Background 7 2.1 Categories . 7 2.2 Functors & Categories of Functors . 14 2.2.1 Functors . 14 2.2.2 Natural Transformations . 16 2.2.3 Functor Categories . 17 2.2.4 The Yoneda Embedding . 17 2.3 Lattices & Heyting Algebras . 18 2.3.1 Lattices . 18 2.3.2 Heyting Algebras . 20 2.4 Monads . 21 2.5 Cartesian Closed Categories . 23 2.6 Topoi . 24 2.7 Presheaves . 27 2.7.1 The Category of Presheaves . 27 2.7.2 Initial and Terminal Objects . 28 2.7.3 Products and Coproducts . 28 2.7.4 Equalizers and Coequalizers . 29 2.7.5 Pullbacks and Pushouts . 30 iv 2.7.6 Exponentials . 31 2.7.7 The Subobject Classifier . 32 2.7.7.1 Subobjects . 32 2.7.7.2 The Subobject Classifier . 32 2.7.8 Local and Global Sections . 34 2.8 Sheaves . 34 Chapter 3 A Sheaf Theoretic Perspective on Higher Order Probabilistic Programming 38 3.1 The Categorical Structure of Measurable Spaces . 39 3.1.1 Non-Existence of Exponentials . 40 3.1.2 Lack of Subobject Classifier . 42 3.2 The Giry Monad . 46 3.2.1 The Endofunctor G ...................... 46 3.2.2 The Natural Transformation η . 46 3.2.3 The Natural Transformation µ . 47 3.2.4 The Kleisli Category of the Giry Monad . 47 3.2.5 Simple Facts About the Giry Monad . 48 3.3 The Cartesian Closed Category of Quasi-Borel Spaces . 49 3.3.1 Quasi-Borel Spaces . 49 3.3.2 Cartesian Closure of QBS.................... 51 3.3.3 The Giry Monad on the Category of Quasi-Borel Spaces . 51 3.3.4 De Finetti Theorem for Quasi-Borel Spaces . 52 3.4 Standard Borel Spaces . 53 3.5 Quasi-Borel Sheaves . 55 3.5.1 Sample Space Category . 55 3.5.2 Quasi-Borel Presheaves . 57 3.5.3 Quasi-Borel Sheaves . 57 3.5.4 Lifting Measures Lemma . 60 3.6 Probability Theory for Quasi-Borel Sheaves . 62 3.6.1 Events . 62 3.6.2 Global Sections, Local Sections, and Subsheaves . 63 3.6.3 Expectation as a Sheaf Morphism . 64 3.7 Future Work . 65 3.7.1 Probabilistic Programming and Simulation of Stochastic Processes . 65 3.7.2 Categorical Logic and Probabilistic Reasoning . 66 3.7.3 Sample Space Category and the Topos Structure . 66 3.7.4 Extension of the Giry Monad . 67 v Chapter 4 Categorical Logic and Relational Databases 68 4.1 Introduction . 68 4.2 Data Tables . 69 4.2.1 Attributes . 71 4.2.2 Attribute Spaces (Data Types) . 71 4.2.3 Missing Data . 72 4.2.4 Data Types . 73 4.2.5 Column Spaces, Tuples, and Tables . 74 4.2.5.1 Column Spaces . 74 4.2.5.2 Records . 74 4.2.5.3 Tables . 74 4.2.6 Primary Keys . 76 4.2.7 Versioning . 76 4.3 Relational Algebra on Tables . 77 4.3.1 Products . 77 4.3.2 Projection . 78 4.3.3 Union . 78 4.3.4 Selection . 78 4.3.5 Difference . 80 4.4 Some Additional Operations on Tables . 81 4.4.1 Addition & Deletion . 81 4.4.2 Editing Records . 81 4.4.2.1 Rename . 82 4.4.2.2 Imputation . 82 4.4.3 Merging Overlapping Records . 83 4.4.3.1 Table Morphisms . 83 4.4.4 Non-Binary Logics . 84 4.5 Random Tables and Random Databases . 84 4.5.1 Random Tables . 85 4.5.2 Giry Monad Applied to Tables . 85 4.5.3 Random Databases . 85 4.6 Topological Aspects of Databases . 87 4.6.1 Simplicial Complex Associated to a Database . 87 4.6.2 Contextuality . 87 4.6.3 Topology on a Database . 89 4.7 Relationship Between Topological Structure of a Schema and Con- textuality . 90 vi Chapter 5 Contextual Statistics 91 5.1 Introduction . 91 5.2 The Bell Marginals . 93 5.3 Skip-NA and Directed Graphical Models . 99 5.4 Motivation from Statistical Privacy . 103 5.5 Poset of Joins of a Database . 103 5.5.1 Contextual Constraint Satisfaction Problems . 104 5.5.2 Poset of Solutions to Contextual Constraint Satisfaction Problems . 105 5.6 Topology of a Database Schema . 108 5.6.1 Contextual Topology on a Database Schema . 109 5.7 Sheaves on Databases . 113 5.7.1 Presheaf of Data Types . 113 5.7.2 Presheaf of Classical Tables of a Fixed Size . 114 5.7.3 Sheaf of Counts on Contextual Tables . 114 5.7.4 Presheaf of Classical Probability Measures . 117 5.7.5 Sheaf of Outcome Spaces . 117 5.7.6 Contextual Random Variables . 118 5.7.7 Sheaf of Parameters . 119 5.7.8 Sheaf of Contextual Probability Measures . 120 5.8 Statistical Models on Contextual Sheaves . 121 5.8.1 Contextual Statistical Models . 122 5.8.2 Factors . 125 5.8.3 Classical Snapshots of a Factor . 126 5.9 Subobject Classifier for Contextual Sheaves . 127 5.10 Local and Global Sections of a Contextual Sheaf . 128 5.11 Fitting Contextual Models . 130 5.11.1 Maximum Likelihood Estimation for the Saturated Contex- tual Model . 130 5.11.2 Classical Approximation of a Contextual Distribution . 132 5.12 Contextual Hypothesis Testing . 134 5.12.1 Testing if Observed Marginals are Drawn from the Same Distribution . 134 5.12.2 Testing if a Collection of Tables can be Explained Classically 135 5.12.3 A Hypothesis Test for Contextuality . 136 5.13 Future Work . 136 5.13.1 Contextuality Penalization . 136 5.13.2 Sampling for Contextual Probability Distributions . 137 vii Chapter 6 Algebraic Hypothesis Testing 139 6.1 Introduction . 139 6.2 From model.

Open Wright-Dissertation.Pdf

A First Introduction to the Algebra of Sentences

Basic Category Theory and Topos Theory

Distribution Algebras and Duality

Topos Theory

Induced Morphisms Between Heyting-Valued Models

Hilbert Algebras As Implicative Partial Semilattices

Some Glances at Topos Theory

On the Model-Completion of Heyting Algebras Luck Darnière

Arxiv:1410.8780V2 [Math.RA]

On the Size of Heyting Semi-Lattices and Equationally Linear Heyting Algebras ∗

Computability of Heyting Algebras and Distributive Lattices

Constructing Illoyal Algebra-Valued Models of Set Theory