A Survey on Structure Learning and Causal Discovery
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Penrose and Hawking Singularity Theorems Revisited
Classical singularity thms. Interlude: Low regularity in GR Low regularity singularity thms. Proofs Outlook The Penrose and Hawking Singularity Theorems revisited Roland Steinbauer Faculty of Mathematics, University of Vienna Prague, October 2016 The Penrose and Hawking Singularity Theorems revisited 1 / 27 Classical singularity thms. Interlude: Low regularity in GR Low regularity singularity thms. Proofs Outlook Overview Long-term project on Lorentzian geometry and general relativity with metrics of low regularity jointly with `theoretical branch' (Vienna & U.K.): Melanie Graf, James Grant, G¨untherH¨ormann,Mike Kunzinger, Clemens S¨amann,James Vickers `exact solutions branch' (Vienna & Prague): Jiˇr´ıPodolsk´y,Clemens S¨amann,Robert Svarcˇ The Penrose and Hawking Singularity Theorems revisited 2 / 27 Classical singularity thms. Interlude: Low regularity in GR Low regularity singularity thms. Proofs Outlook Contents 1 The classical singularity theorems 2 Interlude: Low regularity in GR 3 The low regularity singularity theorems 4 Key issues of the proofs 5 Outlook The Penrose and Hawking Singularity Theorems revisited 3 / 27 Classical singularity thms. Interlude: Low regularity in GR Low regularity singularity thms. Proofs Outlook Table of Contents 1 The classical singularity theorems 2 Interlude: Low regularity in GR 3 The low regularity singularity theorems 4 Key issues of the proofs 5 Outlook The Penrose and Hawking Singularity Theorems revisited 4 / 27 Theorem (Pattern singularity theorem [Senovilla, 98]) In a spacetime the following are incompatible (i) Energy condition (iii) Initial or boundary condition (ii) Causality condition (iv) Causal geodesic completeness (iii) initial condition ; causal geodesics start focussing (i) energy condition ; focussing goes on ; focal point (ii) causality condition ; no focal points way out: one causal geodesic has to be incomplete, i.e., : (iv) Classical singularity thms. -
Common Quandaries and Their Practical Solutions in Bayesian Network Modeling
Ecological Modelling 358 (2017) 1–9 Contents lists available at ScienceDirect Ecological Modelling journa l homepage: www.elsevier.com/locate/ecolmodel Common quandaries and their practical solutions in Bayesian network modeling Bruce G. Marcot U.S. Forest Service, Pacific Northwest Research Station, 620 S.W. Main Street, Suite 400, Portland, OR, USA a r t i c l e i n f o a b s t r a c t Article history: Use and popularity of Bayesian network (BN) modeling has greatly expanded in recent years, but many Received 21 December 2016 common problems remain. Here, I summarize key problems in BN model construction and interpretation, Received in revised form 12 May 2017 along with suggested practical solutions. Problems in BN model construction include parameterizing Accepted 13 May 2017 probability values, variable definition, complex network structures, latent and confounding variables, Available online 24 May 2017 outlier expert judgments, variable correlation, model peer review, tests of calibration and validation, model overfitting, and modeling wicked problems. Problems in BN model interpretation include objec- Keywords: tive creep, misconstruing variable influence, conflating correlation with causation, conflating proportion Bayesian networks and expectation with probability, and using expert opinion. Solutions are offered for each problem and Modeling problems researchers are urged to innovate and share further solutions. Modeling solutions Bias Published by Elsevier B.V. Machine learning Expert knowledge 1. Introduction other fields. Although the number of generally accessible journal articles on BN modeling has continued to increase in recent years, Bayesian network (BN) models are essentially graphs of vari- achieving an exponential growth at least during the period from ables depicted and linked by probabilities (Koski and Noble, 2011). -
Bayesian Network Modelling with Examples
Bayesian Network Modelling with Examples Marco Scutari [email protected] Department of Statistics University of Oxford November 28, 2016 What Are Bayesian Networks? Marco Scutari University of Oxford What Are Bayesian Networks? A Graph and a Probability Distribution Bayesian networks (BNs) are defined by: a network structure, a directed acyclic graph G = (V;A), in which each node vi 2 V corresponds to a random variable Xi; a global probability distribution X with parameters Θ, which can be factorised into smaller local probability distributions according to the arcs aij 2 A present in the graph. The main role of the network structure is to express the conditional independence relationships among the variables in the model through graphical separation, thus specifying the factorisation of the global distribution: p Y P(X) = P(Xi j ΠXi ;ΘXi ) where ΠXi = fparents of Xig i=1 Marco Scutari University of Oxford What Are Bayesian Networks? How the DAG Maps to the Probability Distribution Graphical Probabilistic DAG separation independence A B C E D F Formally, the DAG is an independence map of the probability distribution of X, with graphical separation (??G) implying probabilistic independence (??P ). Marco Scutari University of Oxford What Are Bayesian Networks? Graphical Separation in DAGs (Fundamental Connections) separation (undirected graphs) A B C d-separation (directed acyclic graphs) A B C A B C A B C Marco Scutari University of Oxford What Are Bayesian Networks? Graphical Separation in DAGs (General Case) Now, in the general case we can extend the patterns from the fundamental connections and apply them to every possible path between A and B for a given C; this is how d-separation is defined. -
Closed Timelike Curves, Singularities and Causality: a Survey from Gödel to Chronological Protection
Closed Timelike Curves, Singularities and Causality: A Survey from Gödel to Chronological Protection Jean-Pierre Luminet Aix-Marseille Université, CNRS, Laboratoire d’Astrophysique de Marseille , France; Centre de Physique Théorique de Marseille (France) Observatoire de Paris, LUTH (France) [email protected] Abstract: I give a historical survey of the discussions about the existence of closed timelike curves in general relativistic models of the universe, opening the physical possibility of time travel in the past, as first recognized by K. Gödel in his rotating universe model of 1949. I emphasize that journeying into the past is intimately linked to spacetime models devoid of timelike singularities. Since such singularities arise as an inevitable consequence of the equations of general relativity given physically reasonable assumptions, time travel in the past becomes possible only when one or another of these assumptions is violated. It is the case with wormhole-type solutions. S. Hawking and other authors have tried to save the paradoxical consequences of time travel in the past by advocating physical mechanisms of chronological protection; however, such mechanisms remain presently unknown, even when quantum fluctuations near horizons are taken into account. I close the survey by a brief and pedestrian discussion of Causal Dynamical Triangulations, an approach to quantum gravity in which causality plays a seminal role. Keywords: time travel; closed timelike curves; singularities; wormholes; Gödel’s universe; chronological protection; causal dynamical triangulations 1. Introduction In 1949, the mathematician and logician Kurt Gödel, who had previously demonstrated the incompleteness theorems that broke ground in logic, mathematics, and philosophy, became interested in the theory of general relativity of Albert Einstein, of which he became a close colleague at the Institute for Advanced Study at Princeton. -
Light Rays, Singularities, and All That
Light Rays, Singularities, and All That Edward Witten School of Natural Sciences, Institute for Advanced Study Einstein Drive, Princeton, NJ 08540 USA Abstract This article is an introduction to causal properties of General Relativity. Topics include the Raychaudhuri equation, singularity theorems of Penrose and Hawking, the black hole area theorem, topological censorship, and the Gao-Wald theorem. The article is based on lectures at the 2018 summer program Prospects in Theoretical Physics at the Institute for Advanced Study, and also at the 2020 New Zealand Mathematical Research Institute summer school in Nelson, New Zealand. Contents 1 Introduction 3 2 Causal Paths 4 3 Globally Hyperbolic Spacetimes 11 3.1 Definition . 11 3.2 Some Properties of Globally Hyperbolic Spacetimes . 15 3.3 More On Compactness . 18 3.4 Cauchy Horizons . 21 3.5 Causality Conditions . 23 3.6 Maximal Extensions . 24 4 Geodesics and Focal Points 25 4.1 The Riemannian Case . 25 4.2 Lorentz Signature Analog . 28 4.3 Raychaudhuri’s Equation . 31 4.4 Hawking’s Big Bang Singularity Theorem . 35 5 Null Geodesics and Penrose’s Theorem 37 5.1 Promptness . 37 5.2 Promptness And Focal Points . 40 5.3 More On The Boundary Of The Future . 46 1 5.4 The Null Raychaudhuri Equation . 47 5.5 Trapped Surfaces . 52 5.6 Penrose’s Theorem . 54 6 Black Holes 58 6.1 Cosmic Censorship . 58 6.2 The Black Hole Region . 60 6.3 The Horizon And Its Generators . 63 7 Some Additional Topics 66 7.1 Topological Censorship . 67 7.2 The Averaged Null Energy Condition . -
Empirical Evaluation of Scoring Functions for Bayesian Network Model Selection
Liu et al. BMC Bioinformatics 2012, 13(Suppl 15):S14 http://www.biomedcentral.com/1471-2105/13/S15/S14 PROCEEDINGS Open Access Empirical evaluation of scoring functions for Bayesian network model selection Zhifa Liu1,2†, Brandon Malone1,3†, Changhe Yuan1,4* From Proceedings of the Ninth Annual MCBIOS Conference. Dealing with the Omics Data Deluge Oxford, MS, USA. 17-18 February 2012 Abstract In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike’s information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. -
Bayesian Networks
Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know • Basic concepts and vocabulary of Bayesian networks. – Nodes represent random variables. – Directed arcs represent (informally) direct influences. – Conditional probability tables, P( Xi | Parents(Xi) ). • Given a Bayesian network: – Write down the full joint distribution it represents. • Given a full joint distribution in factored form: – Draw the Bayesian network that represents it. • Given a variable ordering and some background assertions of conditional independence among the variables: – Write down the factored form of the full joint distribution, as simplified by the conditional independence assertions. Computing with Probabilities: Law of Total Probability Law of Total Probability (aka “summing out” or marginalization) P(a) = Σb P(a, b) = Σb P(a | b) P(b) where B is any random variable Why is this useful? given a joint distribution (e.g., P(a,b,c,d)) we can obtain any “marginal” probability (e.g., P(b)) by summing out the other variables, e.g., P(b) = Σa Σc Σd P(a, b, c, d) Less obvious: we can also compute any conditional probability of interest given a joint distribution, e.g., P(c | b) = Σa Σd P(a, c, d | b) = (1 / P(b)) Σa Σd P(a, c, d, b) where (1 / P(b)) is just a normalization constant Thus, the joint distribution contains the information we need to compute any probability of interest. Computing with Probabilities: The Chain Rule or Factoring We can always write P(a, b, c, … z) = P(a | b, c, …. -
Deep Regression Bayesian Network and Its Applications Siqi Nie, Meng Zheng, Qiang Ji Rensselaer Polytechnic Institute
1 Deep Regression Bayesian Network and Its Applications Siqi Nie, Meng Zheng, Qiang Ji Rensselaer Polytechnic Institute Abstract Deep directed generative models have attracted much attention recently due to their generative modeling nature and powerful data representation ability. In this paper, we review different structures of deep directed generative models and the learning and inference algorithms associated with the structures. We focus on a specific structure that consists of layers of Bayesian Networks due to the property of capturing inherent and rich dependencies among latent variables. The major difficulty of learning and inference with deep directed models with many latent variables is the intractable inference due to the dependencies among the latent variables and the exponential number of latent variable configurations. Current solutions use variational methods often through an auxiliary network to approximate the posterior probability inference. In contrast, inference can also be performed directly without using any auxiliary network to maximally preserve the dependencies among the latent variables. Specifically, by exploiting the sparse representation with the latent space, max-max instead of max-sum operation can be used to overcome the exponential number of latent configurations. Furthermore, the max-max operation and augmented coordinate ascent are applied to both supervised and unsupervised learning as well as to various inference. Quantitative evaluations on benchmark datasets of different models are given for both data representation and feature learning tasks. I. INTRODUCTION Deep learning has become a major enabling technology for computer vision. By exploiting its multi-level representation and the availability of the big data, deep learning has led to dramatic performance improvements for certain tasks. -
A Widely Applicable Bayesian Information Criterion
JournalofMachineLearningResearch14(2013)867-897 Submitted 8/12; Revised 2/13; Published 3/13 A Widely Applicable Bayesian Information Criterion Sumio Watanabe [email protected] Department of Computational Intelligence and Systems Science Tokyo Institute of Technology Mailbox G5-19, 4259 Nagatsuta, Midori-ku Yokohama, Japan 226-8502 Editor: Manfred Opper Abstract A statistical model or a learning machine is called regular if the map taking a parameter to a prob- ability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model. -
Problem Set 4 (Causal Structure)
\Singularities, Black Holes, Thermodynamics in Relativistic Spacetimes": Problem Set 4 (causal structure) 1. Wald (1984): ch. 8, problems 1{2 2. Prove or give a counter-example: there is a spacetime with two points simultaneously con- nectable by a timelike geodesic, a null geodesic and a spacelike geodesic. 3. Prove or give a counter-example: if two points are causally related, there is a spacelike curve connecting them. 4. Prove or give a counter-example: any two points of a spacetime can be joined by a spacelike curve. (Hint: there are three cases to consider, first that the temporal futures of the two points intesect, second that their temporal pasts do, and third that neither their temporal futures nor their temporal pasts intersect; recall that the manifold must be connected.) 5. Prove or give a counter-example: every flat Lorentzian metric is temporally orientable. 6. Prove or give a counter-example: if the temporal futures of two points do not intersect, then neither do their temporal pasts. 7. Prove or give a counter-example: for a point p in a spacetime, the temporal past of the temporal future of the temporal past of the temporal future. of p is the entire spacetime. (Hint: if two points can be joined by a spacelike curve, then they can be joined by a curve consisting of some finite number of null geodesic segments glued together; recall that the manifold is connected.) 8. Prove or give a counter-example: if the temporal past of p contains the entire temporal future of q, then there is a closed timelike curve through p. -
Theoretical Computer Science Causal Set Topology
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Theoretical Computer Science 405 (2008) 188–197 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Causal set topology Sumati Surya Raman Research Institute, Bangalore, India article info a b s t r a c t Keywords: The Causal Set Theory (CST) approach to quantum gravity is motivated by the observation Space–time that, associated with any causal spacetime (M, g) is a poset (M, ≺), with the order relation Causality ≺ corresponding to the spacetime causal relation. Spacetime in CST is assumed to have a Topology Posets fundamental atomicity or discreteness, and is replaced by a locally finite poset, the causal Quantum gravity set. In order to obtain a well defined continuum approximation, the causal set must possess the requisite intrinsic topological and geometric properties that characterise a continuum spacetime in the large. The study of causal set topology is thus dictated by the nature of the continuum approximation. We review the status of causal set topology and present some new results relating poset and spacetime topologies. The hope is that in the process, some of the ideas and questions arising from CST will be made accessible to the larger community of computer scientists and mathematicians working on posets. © 2008 Published by Elsevier B.V. 1. The spacetime poset and causal sets In Einstein’s theory of gravity, space and time are merged into a single entity represented by a pseudo-Riemannian or Lorentzian geometry g of signature − + ++ on a differentiable four dimensional manifold M. -
Checking Individuals and Sampling Populations with Imperfect Tests
Checking individuals and sampling populations with imperfect tests Giulio D’Agostini1 and Alfredo Esposito2 Abstract In the last months, due to the emergency of Covid-19, questions related to the fact of belonging or not to a particular class of individuals (‘infected or not infected’), after being tagged as ‘positive’ or ‘negative’ by a test, have never been so popular. Similarly, there has been strong interest in estimating the proportion of a population expected to hold a given characteristics (‘having or having had the virus’). Taking the cue from the many related discussions on the media, in addition to those to which we took part, we analyze these questions from a probabilistic perspective (‘Bayesian’), considering several ef- fects that play a role in evaluating the probabilities of interest. The resulting paper, written with didactic intent, is rather general and not strictly related to pandemics: the basic ideas of Bayesian inference are introduced and the uncertainties on the performances of the tests are treated using the metro- logical concepts of ‘systematics’, and are propagated into the quantities of interest following the rules of probability theory; the separation of ‘statistical’ and ‘systematic’ contributions to the uncertainty on the inferred proportion of infectees allows to optimize the sample size; the role of ‘priors’, often over- looked, is stressed, however recommending the use of ‘flat priors’, since the resulting posterior distribution can be ‘reshaped’ by an ‘informative prior’ in a later step; details on the calculations are given, also deriving useful approx- imated formulae, the tough work being however done with the help of direct Monte Carlo simulations and Markov Chain Monte Carlo, implemented in R and JAGS (relevant code provided in appendix).