Graphical Models, Exponential Families, and Variational Inference

Total Page:16

File Type:pdf, Size:1020Kb

Graphical Models, Exponential Families, and Variational Inference Foundations and Trends ® in sample Vol. xx, No xx (xxxx) 1–294 © xxxx xxxxxxxxx DOI: xxxxxx Graphical models, exponential families, and variational inference Martin J. Wainwright1 and Michael I. Jordan2 1 Department of Statistics, and Department of Electrical Engineering and Computer Science Berkeley 94720, USA, wainwrig@stat.berkeley.edu 2 Department of Statistics, and Department of Electrical Engineering and Computer Science Berkeley 94720, USA, jordan@stat.berkeley.edu Abstract The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random vari- ables, and building large-scale multivariate statistical models. Graph- ical models have become a focus of research in many statistical, com- putational and mathematical fields, including bioinformatics, commu- nication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances—including the key problems of computing marginals and modes of probability distributions—are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate du- ality between the cumulant function and the entropy for exponential families, we develop general variational representations of the prob- lems of computing likelihoods, marginal probabilities and most prob- able configurations. We describe how a wide variety of algorithms— among them sum-product, cluster variational methods, expectation- propagation, mean field methods, and max-product—can all be un- derstood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of ap- proximation methods for inference in large-scale statistical models. Contents 1 Introduction 1 2 Background 5 2.1 Probability distributions on graphs 5 2.1.1 Directed graphical models 6 2.1.2 Undirected graphical models 7 2.1.3 Factor graphs 8 2.2 Conditional independence 9 2.3 Statistical inference and exact algorithms 10 2.4 Applications 12 2.4.1 Hierarchical Bayesian models 12 2.4.2 Contingency table analysis 13 2.4.3 Constraint satisfaction and combinatorial opti- mization 14 2.4.4 Bioinformatics 15 2.4.5 Language and speech processing 18 2.4.6 Image processing and spatial statistics 20 i ii Contents 2.4.7 Error-correcting coding 21 2.5 Exact inference algorithms 24 2.5.1 Message-passing on trees 25 2.5.2 Junction tree representation 28 2.6 Message-passing algorithms for approximate inference 33 3 Graphical models as exponential families 36 3.1 Exponential representations via maximum entropy 36 3.2 Basics of exponential families 38 3.3 Examples of graphical models in exponential form 41 3.4 Mean parameterization and inference problems 50 3.4.1 Mean parameter spaces and marginal polytopes 51 3.4.2 Role of mean parameters in inference problems 59 3.5 Properties of A 60 3.5.1 Derivatives and convexity 61 3.5.2 Forward mapping to mean parameters 62 3.6 Conjugate duality: Maximum likelihood and maximum entropy 65 3.6.1 General form of conjugate dual 65 3.6.2 Some simple examples 68 3.7 Challenges in high-dimensional models 71 4 Sum-product, Bethe-Kikuchi, and expectation-propagation 74 4.1 Sum-product and Bethe approximation 75 4.1.1 A tree-based outer bound to M(G) 76 4.1.2 Bethe entropy approximation 80 4.1.3 Bethe variational problem and sum-product 82 4.1.4 Inexactness of Bethe and sum-product 87 4.1.5 Bethe optima and reparameterization 90 4.1.6 Bethe and loop series expansions 93 4.2 Kikuchi and hypertree-based methods 97 4.2.1 Hypergraphs and hypertrees 97 4.2.2 Kikuchi and related approximations 102 Contents iii 4.2.3 Generalized belief propagation 104 4.3 Expectation-propagation algorithms 106 4.3.1 Entropy approximations based on term decoupling108 4.3.2 Optimality in terms of moment-matching 115 5 Mean field methods 124 5.1 Tractable families 125 5.2 Optimization and lower bounds 126 5.2.1 Generic mean field procedure 127 5.2.2 Mean field and Kullback-Leibler divergence 128 5.3 Examples of mean field algorithms 131 5.3.1 Naive mean field updates 131 5.3.2 Structured mean field 135 5.4 Non-convexity of mean field 141 6 Variational methods in parameter estimation 146 6.1 Estimation in fully observed models 146 6.1.1 Maximum likelihood for triangulated graphs 147 6.1.2 Iterative methods for computing MLEs 149 6.2 Partially observed models and expectation-maximization 151 6.2.1 Exact EM algorithm in exponential families 152 6.2.2 Variational EM 156 6.3 Variational Bayes 157 7 Convex relaxations and upper bounds 162 7.1 Generic convex combinations and convex surrogates 164 7.2 Variational methods from convex relaxations 167 7.2.1 Tree-reweighted sum-product and Bethe 167 7.2.2 Reweighted Kikuchi approximations 173 7.2.3 Convex forms of expectation-propagation 177 7.2.4 Planar graph decomposition 178 7.3 Other convex variational methods 179 7.3.1 Semidefinite constraints and log-determinant bound 179 iv Contents 7.3.2 Other entropy approximations and polytope bounds 182 7.4 Algorithmic stability 182 7.5 Convex surrogates in parameter estimation 185 7.5.1 Surrogate likelihoods 186 7.5.2 Optimizing surrogate likelihoods 186 7.5.3 Penalized surrogate likelihoods 188 8 Max-product and LP relaxations 190 8.1 Variational formulation of computing modes 190 8.2 Max-product and linear programming for trees 193 8.3 Max-product for Gaussians and other convex problems 197 8.4 First-order LP relaxation and reweighted max-product 199 8.4.1 Basic properties of first-order LP relaxation 200 8.4.2 Connection to max-product message-passing 205 8.4.3 Modified forms of message-passing 208 8.4.4 Examples of the first-order LP relaxation 211 8.5 Higher-order LP relaxations 220 9 Moment matrices and conic relaxations 228 9.1 Moment matrices and their properties 229 9.1.1 Multinomial and indicator bases 230 9.1.2 Marginal polytopes for hypergraphs 233 9.2 Semidefinite bounds on marginal polytopes 233 9.2.1 Lasserre sequence 234 9.2.2 Tightness of semidefinite outer bounds 236 9.3 Link to LP relaxations and graphical structure 240 9.4 Second-order cone relaxations 244 10 Discussion 246 A Background material 249 A.1 Background on graphs and hypergraphs 249 Contents v A.2 Basics of convex sets and functions 251 A.2.1 Convex sets and cones 251 A.2.2 Convex and affine hulls 252 A.2.3 Affine hulls and relative interior 252 A.2.4 Polyhedra and their representations 253 A.2.5 Convex functions 253 A.2.6 Conjugate duality 255 B Proofs for exponential families and duality 257 B.1 Proof of Theorem 1 257 B.2 Proof of Theorem 2 259 B.3 General properties of M and A∗ 261 B.3.1 Properties of M 261 B.3.2 Properties of A∗ 262 B.4 Proof of Theorem 3(b) 263 B.5 Proof of Theorem 5 264 C Variational principles for multivariate Gaussians 265 C.1 Gaussian with known covariance 265 C.2 Gaussian with unknown covariance 267 C.3 Gaussian mode computation 268 D Clustering and augmented hypergraphs 270 D.1 Covering augmented hypergraphs 270 D.2 Specification of compatibility functions 274 E Miscellaneous results 276 E.1 M¨obius inversion 276 E.2 Pairwise Markov random fields 277 References 279 1 Introduction Graphical models bring together graph theory and probability theory in a powerful formalism for multivariate statistical modeling. In var- ious applied fields including bioinformatics, speech processing, image processing and control theory, statistical models have long been formu- lated in terms of graphs, and algorithms for computing basic statisti- cal quantities such as likelihoods and score functions have often been expressed in terms of recursions operating on these graphs; examples include phylogenies, pedigrees, hidden Markov models, Markov random fields, and Kalman filters. These ideas can be understood, unified and generalized within the formalism of graphical models. Indeed, graph- ical models provide a natural tool for formulating variations on these classical architectures, as well as for exploring entirely new families of statistical models. Accordingly, in fields that involve the study of large numbers of interacting variables, graphical models are increasingly in evidence. Graph theory plays an important role in many computationally- oriented fields, including combinatorial optimization, statistical physics and economics. Beyond its use as a language for formulating models, graph theory also plays a fundamental role in assessing computational 1 2 Introduction complexity and feasibility. In particular, the running time of an algo- rithm or the magnitude of an error bound can often be characterized in terms of structural properties of a graph. This statement is also true in the context of graphical models. Indeed, as we discuss, the compu- tational complexity of a fundamental method known as the junction tree algorithm—which generalizes many of the recursive algorithms on graphs cited above—can be characterized in terms of a natural graph- theoretic measure of interaction among variables. For suitably sparse graphs, the junction tree algorithm provides a systematic solution to the problem of computing likelihoods and other statistical quantities associated with a graphical model. Unfortunately, many graphical models of practical interest are not “suitably sparse,” so that the junction tree algorithm no longer provides a viable computational framework. One popular source of methods for attempting to cope with such cases is the Markov chain Monte Carlo (MCMC) framework, and indeed there is a significant literature on the application of MCMC methods to graphical models [e.g., 26, 90].
Recommended publications
  • Graphical Models for Discrete and Continuous Data Arxiv:1609.05551V3
    Graphical Models for Discrete and Continuous Data Rui Zhuang Department of Biostatistics, University of Washington and Noah Simon Department of Biostatistics, University of Washington and Johannes Lederer Department of Mathematics, Ruhr-University Bochum June 18, 2019 Abstract We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihood estimator, and we apply this pipeline to learn simultaneous neural activities from spike data. Keywords: Non-Gaussian Data, Graphical Models, Maximum Likelihood Estimation arXiv:1609.05551v3 [math.ST] 15 Jun 2019 1 Introduction Gaussian graphical models (Drton & Maathuis 2016, Lauritzen 1996, Wainwright & Jordan 2008) describe the dependence structures in normally distributed random vectors. These models have become increasingly popular in the sciences, because their representation of the dependencies is lucid and can be readily estimated. For a brief overview, consider a 1 random vector X 2 Rp that follows a centered normal distribution with density 1 −x>Σ−1x=2 fΣ(x) = e (1) (2π)p=2pjΣj with respect to Lebesgue measure, where the population covariance Σ 2 Rp×p is a symmetric and positive definite matrix. Gaussian graphical models associate these densities with a graph (V; E) that has vertex set V := f1; : : : ; pg and edge set E := f(i; j): i; j 2 −1 f1; : : : ; pg; i 6= j; Σij 6= 0g: The graph encodes the dependence structure of X in the sense that any two entries Xi;Xj; i 6= j; are conditionally independent given all other entries if and only if (i; j) 2= E.
    [Show full text]
  • Graphical Models
    Graphical Models Michael I. Jordan Computer Science Division and Department of Statistics University of California, Berkeley 94720 Abstract Statistical applications in fields such as bioinformatics, information retrieval, speech processing, im- age processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing. Keywords: Probabilistic graphical models; junction tree algorithm; sum-product algorithm; Markov chain Monte Carlo; variational inference; bioinformatics; error-control coding. 1. Introduction The fields of Statistics and Computer Science have generally followed separate paths for the past several decades, which each field providing useful services to the other, but with the core concerns of the two fields rarely appearing to intersect. In recent years, however, it has become increasingly evident that the long-term goals of the two fields are closely aligned. Statisticians are increasingly concerned with the computational aspects, both theoretical and practical, of models and inference procedures. Computer scientists are increasingly concerned with systems that interact with the external world and interpret uncertain data in terms of underlying probabilistic models. One area in which these trends are most evident is that of probabilistic graphical models.
    [Show full text]
  • A Tail Quantile Approximation Formula for the Student T and the Symmetric Generalized Hyperbolic Distribution
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Schlüter, Stephan; Fischer, Matthias J. Working Paper A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution IWQW Discussion Papers, No. 05/2009 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Schlüter, Stephan; Fischer, Matthias J. (2009) : A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution, IWQW Discussion Papers, No. 05/2009, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung (IWQW), Nürnberg This Version is available at: http://hdl.handle.net/10419/29554 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu IWQW Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung Diskussionspapier Discussion Papers No.
    [Show full text]
  • Thompson Sampling on Symmetric Α-Stable Bandits
    Thompson Sampling on Symmetric α-Stable Bandits Abhimanyu Dubey and Alex Pentland Massachusetts Institute of Technology fdubeya, pentlandg@mit.edu Abstract Thompson Sampling provides an efficient technique to introduce prior knowledge in the multi-armed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from α-stable dis- tributions, which are a class of heavy-tailed probability distributions utilized in finance and economics, in problems such as modeling stock prices and human behavior. We present an efficient framework for α-stable posterior inference, which leads to two algorithms for Thomp- son Sampling in this setting. We prove finite-time regret bounds for both algorithms, and demonstrate through a series of experiments the stronger performance of Thompson Sampling in this setting. With our results, we provide an exposition of α-stable distributions in sequential decision-making, and enable sequential Bayesian inference in applications from diverse fields in finance and complex systems that operate on heavy-tailed features. 1 Introduction The multi-armed bandit (MAB) problem is a fundamental model in understanding the exploration- exploitation dilemma in sequential decision-making. The problem and several of its variants have been studied extensively over the years, and a number of algorithms have been proposed that op- timally solve the bandit problem when the reward distributions are well-behaved, i.e. have a finite support, or are sub-exponential. The most prominently studied class of algorithms are the Upper Confidence Bound (UCB) al- gorithms, that employ an \optimism in the face of uncertainty" heuristic [ACBF02], which have been shown to be optimal (in terms of regret) in several cases [CGM+13, BCBL13].
    [Show full text]
  • Mixture Random Effect Model Based Meta-Analysis for Medical Data
    Mixture Random Effect Model Based Meta-analysis For Medical Data Mining Yinglong Xia?, Shifeng Weng?, Changshui Zhang??, and Shao Li State Key Laboratory of Intelligent Technology and Systems,Department of Automation, Tsinghua University, Beijing, China xiayl03@mails.tsinghua.edu.cn, wengsf@tsinghua.org.cn, fzcs,shaolig@mail.tsinghua.edu.cn Abstract. As a powerful tool for summarizing the distributed medi- cal information, Meta-analysis has played an important role in medical research in the past decades. In this paper, a more general statistical model for meta-analysis is proposed to integrate heterogeneous medi- cal researches efficiently. The novel model, named mixture random effect model (MREM), is constructed by Gaussian Mixture Model (GMM) and unifies the existing fixed effect model and random effect model. The pa- rameters of the proposed model are estimated by Markov Chain Monte Carlo (MCMC) method. Not only can MREM discover underlying struc- ture and intrinsic heterogeneity of meta datasets, but also can imply reasonable subgroup division. These merits embody the significance of our methods for heterogeneity assessment. Both simulation results and experiments on real medical datasets demonstrate the performance of the proposed model. 1 Introduction As the great improvement of experimental technologies, the growth of the vol- ume of scientific data relevant to medical experiment researches is getting more and more massively. However, often the results spreading over journals and on- line database appear inconsistent or even contradict because of variance of the studies. It makes the evaluation of those studies to be difficult. Meta-analysis is statistical technique for assembling to integrate the findings of a large collection of analysis results from individual studies.
    [Show full text]
  • On Products of Gaussian Random Variables
    On products of Gaussian random variables Zeljkaˇ Stojanac∗1, Daniel Suessy1, and Martin Klieschz2 1Institute for Theoretical Physics, University of Cologne, Germany 2 Institute of Theoretical Physics and Astrophysics, University of Gda´nsk,Poland May 29, 2018 Sums of independent random variables form the basis of many fundamental theorems in probability theory and statistics, and therefore, are well understood. The related problem of characterizing products of independent random variables seems to be much more challenging. In this work, we investigate such products of normal random vari- ables, products of their absolute values, and products of their squares. We compute power-log series expansions of their cumulative distribution function (CDF) based on the theory of Fox H-functions. Numerically we show that for small arguments the CDFs are well approximated by the lowest orders of this expansion. For the two non-negative random variables, we also compute the moment generating functions in terms of Mei- jer G-functions, and consequently, obtain a Chernoff bound for sums of such random variables. Keywords: Gaussian random variable, product distribution, Meijer G-function, Cher- noff bound, moment generating function AMS subject classifications: 60E99, 33C60, 62E15, 62E17 1. Introduction and motivation Compared to sums of independent random variables, our understanding of products is much less comprehensive. Nevertheless, products of independent random variables arise naturally in many applications including channel modeling [1,2], wireless relaying systems [3], quantum physics (product measurements of product states), as well as signal processing. Here, we are particularly motivated by a tensor sensing problem (see Ref. [4] for the basic idea). In this problem we consider n1×n2×···×n tensors T 2 R d and wish to recover them from measurements of the form yi := hAi;T i 1 2 d j nj with the sensing tensors also being of rank one, Ai = ai ⊗ai ⊗· · ·⊗ai with ai 2 R .
    [Show full text]
  • A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess
    S S symmetry Article A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess Guillermo Martínez-Flórez 1, Víctor Leiva 2,* , Emilio Gómez-Déniz 3 and Carolina Marchant 4 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 14014, Colombia; guillermomartinez@correo.unicordoba.edu.co 2 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile 3 Facultad de Economía, Empresa y Turismo, Universidad de Las Palmas de Gran Canaria and TIDES Institute, 35001 Canarias, Spain; emilio.gomez-deniz@ulpgc.es 4 Facultad de Ciencias Básicas, Universidad Católica del Maule, 3466706 Talca, Chile; carolina.marchant.fuentes@gmail.com * Correspondence: victor.leiva@pucv.cl or victorleivasanchez@gmail.com Received: 30 June 2020; Accepted: 19 August 2020; Published: 1 September 2020 Abstract: In this paper, we consider skew-normal distributions for constructing new a distribution which allows us to model proportions and rates with zero/one inflation as an alternative to the inflated beta distributions. The new distribution is a mixture between a Bernoulli distribution for explaining the zero/one excess and a censored skew-normal distribution for the continuous variable. The maximum likelihood method is used for parameter estimation. Observed and expected Fisher information matrices are derived to conduct likelihood-based inference in this new type skew-normal distribution. Given the flexibility of the new distributions, we are able to show, in real data scenarios, the good performance of our proposal. Keywords: beta distribution; centered skew-normal distribution; maximum-likelihood methods; Monte Carlo simulations; proportions; R software; rates; zero/one inflated data 1.
    [Show full text]
  • D. Normal Mixture Models and Elliptical Models
    D. Normal Mixture Models and Elliptical Models 1. Normal Variance Mixtures 2. Normal Mean-Variance Mixtures 3. Spherical Distributions 4. Elliptical Distributions QRM 2010 74 D1. Multivariate Normal Mixture Distributions Pros of Multivariate Normal Distribution • inference is \well known" and estimation is \easy". • distribution is given by µ and Σ. • linear combinations are normal (! VaR and ES calcs easy). • conditional distributions are normal. > • For (X1;X2) ∼ N2(µ; Σ), ρ(X1;X2) = 0 () X1 and X2 are independent: QRM 2010 75 Multivariate Normal Variance Mixtures Cons of Multivariate Normal Distribution • tails are thin, meaning that extreme values are scarce in the normal model. • joint extremes in the multivariate model are also too scarce. • the distribution has a strong form of symmetry, called elliptical symmetry. How to repair the drawbacks of the multivariate normal model? QRM 2010 76 Multivariate Normal Variance Mixtures The random vector X has a (multivariate) normal variance mixture distribution if d p X = µ + WAZ; (1) where • Z ∼ Nk(0;Ik); • W ≥ 0 is a scalar random variable which is independent of Z; and • A 2 Rd×k and µ 2 Rd are a matrix and a vector of constants, respectively. > Set Σ := AA . Observe: XjW = w ∼ Nd(µ; wΣ). QRM 2010 77 Multivariate Normal Variance Mixtures Assumption: rank(A)= d ≤ k, so Σ is a positive definite matrix. If E(W ) < 1 then easy calculations give E(X) = µ and cov(X) = E(W )Σ: We call µ the location vector or mean vector and we call Σ the dispersion matrix. The correlation matrices of X and AZ are identical: corr(X) = corr(AZ): Multivariate normal variance mixtures provide the most useful examples of elliptical distributions.
    [Show full text]
  • Mixture Modeling with Applications in Alzheimer's Disease
    University of Kentucky UKnowledge Theses and Dissertations--Epidemiology and Biostatistics College of Public Health 2017 MIXTURE MODELING WITH APPLICATIONS IN ALZHEIMER'S DISEASE Frank Appiah University of Kentucky, frankappiah11@gmail.com Digital Object Identifier: https://doi.org/10.13023/ETD.2017.100 Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Recommended Citation Appiah, Frank, "MIXTURE MODELING WITH APPLICATIONS IN ALZHEIMER'S DISEASE" (2017). Theses and Dissertations--Epidemiology and Biostatistics. 14. https://uknowledge.uky.edu/epb_etds/14 This Doctoral Dissertation is brought to you for free and open access by the College of Public Health at UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Epidemiology and Biostatistics by an authorized administrator of UKnowledge. For more information, please contact UKnowledge@lsv.uky.edu. STUDENT AGREEMENT: I represent that my thesis or dissertation and abstract are my original work. Proper attribution has been given to all outside sources. I understand that I am solely responsible for obtaining any needed copyright permissions. I have obtained needed written permission statement(s) from the owner(s) of each third-party copyrighted matter to be included in my work, allowing electronic distribution (if such use is not permitted by the fair use doctrine) which will be submitted to UKnowledge as Additional File. I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and royalty-free license to archive and make accessible my work in whole or in part in all forms of media, now or hereafter known.
    [Show full text]
  • Constructing Copulas from Shock Models with Imprecise Distributions
    Constructing copulas from shock models with imprecise distributions MatjaˇzOmladiˇc Institute of Mathematics, Physics and Mechanics, Ljubljana, Slovenia matjaz@omladic.net Damjan Skuljˇ Faculty of Social Sciences, University of Ljubljana, Slovenia damjan.skulj@fdv.uni-lj.si November 19, 2019 Abstract The omnipotence of copulas when modeling dependence given marg- inal distributions in a multivariate stochastic situation is assured by the Sklar's theorem. Montes et al. (2015) suggest the notion of what they call an imprecise copula that brings some of its power in bivari- ate case to the imprecise setting. When there is imprecision about the marginals, one can model the available information by means of arXiv:1812.07850v5 [math.PR] 18 Nov 2019 p-boxes, that are pairs of ordered distribution functions. By anal- ogy they introduce pairs of bivariate functions satisfying certain con- ditions. In this paper we introduce the imprecise versions of some classes of copulas emerging from shock models that are important in applications. The so obtained pairs of functions are not only imprecise copulas but satisfy an even stronger condition. The fact that this con- dition really is stronger is shown in Omladiˇcand Stopar (2019) thus raising the importance of our results. The main technical difficulty 1 in developing our imprecise copulas lies in introducing an appropriate stochastic order on these bivariate objects. Keywords. Marshall's copula, maxmin copula, p-box, imprecise probability, shock model 1 Introduction In this paper we propose copulas arising from shock models in the presence of probabilistic uncertainty, which means that probability distributions are not necessarily precisely known. Copulas have been introduced in the precise setting by A.
    [Show full text]
  • Graphical Modelling of Multivariate Spatial Point Processes
    Graphical modelling of multivariate spatial point processes Matthias Eckardt Department of Computer Science, Humboldt Universit¨at zu Berlin, Berlin, Germany July 26, 2016 Abstract This paper proposes a novel graphical model, termed the spatial dependence graph model, which captures the global dependence structure of different events that occur randomly in space. In the spatial dependence graph model, the edge set is identified by using the conditional partial spectral coherence. Thereby, nodes are related to the components of a multivariate spatial point process and edges express orthogo- nality relation between the single components. This paper introduces an efficient approach towards pattern analysis of highly structured and high dimensional spatial point processes. Unlike all previous methods, our new model permits the simultane- ous analysis of all multivariate conditional interrelations. The potential of our new technique to investigate multivariate structural relations is illustrated using data on forest stands in Lansing Woods as well as monthly data on crimes committed in the City of London. Keywords: Crime data, Dependence structure; Graphical model; Lansing Woods, Multi- variate spatial point pattern arXiv:1607.07083v1 [stat.ME] 24 Jul 2016 1 1 Introduction The analysis of spatial point patterns is a rapidly developing field and of particular interest to many disciplines. Here, a main concern is to explore the structures and relations gener- ated by a countable set of randomly occurring points in some bounded planar observation window. Generally, these randomly occurring points could be of one type (univariate) or of two and more types (multivariate). In this paper, we consider the latter type of spatial point patterns.
    [Show full text]
  • Decomposable Principal Component Analysis Ami Wiesel, Member, IEEE, and Alfred O
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 11, NOVEMBER 2009 4369 Decomposable Principal Component Analysis Ami Wiesel, Member, IEEE, and Alfred O. Hero, Fellow, IEEE Abstract—In this paper, we consider principal component anal- computer networks [6], [12], [15]. In particular, [9] and [18] ysis (PCA) in decomposable Gaussian graphical models. We exploit proposed to approximate the global PCA using a sequence of the prior information in these models in order to distribute PCA conditional local PCA solutions. Alternatively, an approximate computation. For this purpose, we reformulate the PCA problem in the sparse inverse covariance (concentration) domain and ad- solution which allows a tradeoff between performance and com- dress the global eigenvalue problem by solving a sequence of local munication requirements was proposed in [12] using eigenvalue eigenvalue problems in each of the cliques of the decomposable perturbation theory. graph. We illustrate our methodology in the context of decentral- DPCA is an efficient implementation of distributed PCA ized anomaly detection in the Abilene backbone network. Based on the topology of the network, we propose an approximate statistical based on a prior graphical model. Unlike the above references graphical model and distribute the computation of PCA. it does not try to approximate PCA, but yields an exact solution Index Terms—Anomaly detection, graphical models, principal up to on any given error tolerance. DPCA assumes additional component analysis. prior knowledge in the form of a graphical model that is not taken into account by previous distributed PCA methods. Such models are very common in signal processing and communi- I. INTRODUCTION cations.
    [Show full text]