Graphical Models

Total Page:16

File Type:pdf, Size:1020Kb

Graphical Models Graphical Models Zoubin Ghahramani Department of Engineering University of Cambridge, UK [email protected] http://learning.eng.cam.ac.uk/zoubin/ MLSS 2012 La Palma Representing knowledge through graphical models A B C D E • Nodes correspond to random variables • Edges represent statistical dependencies between the variables Why do we need graphical models? • Graphs are an intuitive way of representing and visualising the relationships between many variables. (Examples: family trees, electric circuit diagrams, neural networks) • A graph allows us to abstract out the conditional independence relationships between the variables from the details of their parametric forms. Thus we can answer questions like: \Is A dependent on B given that we know the value of C?" just by looking at the graph. • Graphical models allow us to define general message-passing algorithms that implement probabilistic inference efficiently. Thus we can answer queries like \What is p(AjC = c)?" without enumerating all settings of all variables in the model. Graphical models = statistics × graph theory × computer science. Directed Acyclic Graphical Models (Bayesian Networks) A B C D A DAG Model / Bayesian network1 corresponds to a factorization of the joint probability distribution: E p(A; B; C; D; E) = p(A)p(B)p(CjA; B)p(DjB; C)p(EjC; D) In general: n Y p(X1;:::;Xn) = p(XijXpa(i)) i=1 where pa(i) are the parents of node i. 1\Bayesian networks" can and often are learned using non-Bayesian (i.e. frequentist) methods; Bayesian networks (i.e. DAGs) do not require parameter or structure learning using Bayesian methods. Also called \belief networks". Directed Acyclic Graphical Models (Bayesian Networks) A B Semantics: X??Y jV if V d-separatesCX from Y Definition: V d-separates X from Y if every undirected path2 between X and Y is blocked by V. A path is blocked by V if there is aD node W on the path such that either: E 1. W has converging arrows along the path (! W )3 and neither W nor its descendants are observed (in V), or 2. W does not have converging arrows along the path (! W ! or W !) and W is observed (W 2 V). Corollary: Markov Boundary for X: fparents(X) [ children(X) [ parents-of-children(X)g. 2An undirected path ignores the direction of the edges. 3Note that converging arrows along the path only refers to what happens on that path. Also called a collider. Directed Graphs for Statistical Models: Plate Notation Consider the following simple model. A data set of N points is generated i.i.d. from a Gaussian with mean µ and standard deviation σ: N Y p(x1; : : : ; xN ; µ, σ) = p(µ)p(σ) p(xnjµ, σ) n=1 This can be represented graphically as follows: μ σ μ σ ≡ xn x1 x2 ... xN N Inference in a graphical model A B Consider the following graph: which represents: p(A; B; C; D; E) = p(A)p(B)p(CjA; B)p(DjB; C)p(EjC; D) C Inference: evaluate the probability distribution over some set of variables, given the values of another set of variables. D For example, how can we computeE p(AjC = c)? Assume each variable is binary. Naive method: X p(A; C = c) = p(A; B; C = c; D; E) [16 terms] B;D;E X p(C = c) = p(A; C = c) [2 terms] A p(A; C = c) p(AjC = c) = [2 terms] p(C = c) Total: 16+2+2 = 20 terms have to be computed and summed Inference in a graphical model A B Consider the following graph: which represents: p(A; B; C; D; E) = p(A)p(BC)p(CjA; B)p(DjB; C)p(EjC; D) Computing p(AjC = c). D More efficient method: E X p(A; C = c) = p(A)p(B)p(C = cjA; B)p(DjB; C = c)p(EjC = c; D) B;D;E X X X = p(A)p(B)p(C = cjA; B) p(DjB; C = c) p(EjC = c; D) B D E X = p(A)p(B)p(C = cjA; B) [4 terms] B Total: 4+2+2 = 8 terms Belief propagation methods use the conditional independence relationships in a graph to do efficient inference (for singly connected graphs, exponential gains in efficiency!). Factor graph propagation Algorithmically and implementationally, it's often easier to convert directed and undirected graphs into factor graphs, and run factor graph propagation. p(x) = p(x1)p(x2jx1)p(x3jx2)p(x4jx2) ≡ f1(x1; x2)f2(x2; x3)f3(x2; x4) Singly connected vs Multiply connected factor graphs: x3 x3 f 2 f2 f4 f 1 x f1 2 x2 x1 f x4 x x 3 1 f3 4 Factor Graphs In a factor graph, the joint probability distribution is written as a product of factors. Consider a vector of variables x = (x1; : : : ; xn) 1 Y p(x) = p(x ; : : : ; x ) = f (x ) 1 n Z j Sj j where Z is the normalisation constant, Sj denotes the subset of f1; : : : ; ng which participate in factor fj and xSj = fxi : i 2 Sjg. x3 f2 f4 f1 x2 x1 f x4 variables nodes: we draw open circles for each variable xi in the distribution.3 factor nodes: we draw filled dots for each factor fj in the distribution. Propagation in Factor Graphs Let n(x) denote the set of factor nodes that are neighbors of x. Let n(f) denote the set of variable nodes that are neighbors of f. We can compute probabilities in a factor graph by propagating messages from variable nodes to factor nodes and viceversa. message from variable x to factor f: Y µx!f (x) = µh!x(x) h2n(x)nffg message from factor f to variable x: 0 1 X Y µf!x(x) = @f(x) µy!f (y)A xnx y2n(f)nfxg P where x are the variables that factor f depends on, and xnx is a sum over all variables neighboring factor f except x. Propagation in Factor Graphs n(x) denotes the set of factor nodes that are neighbors of x. n(f) denotes the set of variable nodes that are neighbors of f. message from variable x to factor f: Y µx!f (x) = µh!x(x) h2n(x)nffg message from factor f to variable x: 0 1 X Y µf!x(x) = @f(x) µy!f (y)A xnx y2n(f)nfxg If a variable has only one factor as a neighbor, it can initiate message propagation. Once a variable has received all messages from its neighboring factor nodes, one can compute the probability of that variable by multiplying all the messages and renormalising: Y p(x) / µh!x(x) h2n(x) Inference in Hidden markov models and Linear Gaussian state-space models ¢ ¡ X 1 X 2 X 3 X T ¢ ¡ Y1 Y2 Y 3 YT T Y p(X1;:::;T ;Y1;:::;T ) = p(X1)p(Y1jX1) [p(XtjXt−1)p(YtjXt)] t=2 • In HMMs, the states Xt are discrete. • In linear Gaussian SSMs, the states are real Gaussian vectors. • Both HMMs and SSMs can be represented as singly connected DAGs. • The forward{backward algorithm in hidden Markov models (HMMs), and the Kalman smoothing algorithm in SSMs are both instances of belief propagation / factor graph propagation. Software for Graphical Models • BUGS and WinBUGS: inference via Gibbs sampling, not very scalable • HUGIN: widely used, commercial, focus on exact inference • Kevin Murphy's Bayes Net Toolbox: Matlab, widely used • Microsoft's Infer.NET: advanced scalable libraries implementing factor graph propagation, EP, and variational message passing. • Jeff Bilmes' GMTK: very good at HMMs and related time series models • many others, see http://people.cs.ubc.ca/∼murphyk/Software/bnsoft.html Summary • inference consists of the problem of computing p(variables of interestjobserved variables) • for singly connected graphs, belief propagation/ factor graph propagation solves this problem exactly. • well-known algorithms such as Kalman smoothing and forward-backward are special cases these general propagation algorithms. • for multiply connected graphs, the junction tree algorithm solves the exact inference problem, but can be very slow (exponential in the cardinality of the largest clique). • one approximate inference algorithm is \loopy belief propagation"|run propagation as if graph is simply connected; often works well in practice. Learning parameters x1 x2 θ2 x2 0.2 0.3 0.5 x x3 1 x4 0.1 0.6 0.3 p(x1)p(x2jx1)p(x3jx1)p(x4jx2) Assume each variable xi is discrete and can take on Ki values. The parameters of this model can be represented as 4 tables: θ1 has K1 entries, θ2 has K1 × K2 entries, etc. These are called conditional probability tables (CPTs) with the following semantics: 0 p(x1 = k) = θ1;k p(x2 = k jx1 = k) = θ2;k;k0 If node i has M parents, θi can be represented either as an M + 1 dimensional Q table, or as a 2-dimensional table with j2pa(i) Kj × Ki entries by collapsing all P the states of the parents of node i. Note that k0 θi;k;k0 = 1. (n) N Assume a data set D = fx gn=1. How do we learn θ from D? Learning parameters x 1 (n) N Assume a data set D = fx gn=1. How do we learn θ from D? x2 p(xjθ) = p(x1jθ1)p(x2jx1; θ2)p(x3jx1; θ3)p(x4jx2; θ4) x3 Likelihood: x N 4 Y p(Djθ) = p(x(n)jθ) n=1 Log Likelihood: N X X (n) (n) log p(Djθ) = log p(xi jxpa(i); θi) n=1 i This decomposes into sum of functions of θi.
Recommended publications
  • Inference Algorithm for Finite-Dimensional Spin
    PHYSICAL REVIEW E 84, 046706 (2011) Inference algorithm for finite-dimensional spin glasses: Belief propagation on the dual lattice Alejandro Lage-Castellanos and Roberto Mulet Department of Theoretical Physics and Henri-Poincare´ Group of Complex Systems, Physics Faculty, University of Havana, La Habana, Codigo Postal 10400, Cuba Federico Ricci-Tersenghi Dipartimento di Fisica, INFN–Sezione di Roma 1, and CNR–IPCF, UOS di Roma, Universita` La Sapienza, Piazzale Aldo Moro 5, I-00185 Roma, Italy Tommaso Rizzo Dipartimento di Fisica and CNR–IPCF, UOS di Roma, Universita` La Sapienza, Piazzale Aldo Moro 5, I-00185 Roma, Italy (Received 18 February 2011; revised manuscript received 15 September 2011; published 24 October 2011) Starting from a cluster variational method, and inspired by the correctness of the paramagnetic ansatz [at high temperatures in general, and at any temperature in the two-dimensional (2D) Edwards-Anderson (EA) model] we propose a message-passing algorithm—the dual algorithm—to estimate the marginal probabilities of spin glasses on finite-dimensional lattices. We use the EA models in 2D and 3D as benchmarks. The dual algorithm improves the Bethe approximation, and we show that in a wide range of temperatures (compared to the Bethe critical temperature) our algorithm compares very well with Monte Carlo simulations, with the double-loop algorithm, and with exact calculation of the ground state of 2D systems with bimodal and Gaussian interactions. Moreover, it is usually 100 times faster than other provably convergent methods, as the double-loop algorithm. In 2D and 3D the quality of the inference deteriorates only where the correlation length becomes very large, i.e., at low temperatures in 2D and close to the critical temperature in 3D.
    [Show full text]
  • Bayesian Network - Wikipedia
    2019/9/16 Bayesian network - Wikipedia Bayesian network A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network A simple Bayesian network. Rain could represent the probabilistic relationships between diseases and influences whether the sprinkler symptoms. Given symptoms, the network can be used to compute the is activated, and both rain and probabilities of the presence of various diseases. the sprinkler influence whether the grass is wet. Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Contents Graphical model Example Inference and learning Inferring unobserved variables Parameter learning Structure learning Statistical introduction Introductory examples Restrictions on priors Definitions and concepts Factorization definition Local Markov property Developing Bayesian networks Markov blanket Causal networks Inference complexity and approximation algorithms Software History See also Notes References https://en.wikipedia.org/wiki/Bayesian_network 1/14 2019/9/16 Bayesian network - Wikipedia Further reading External links Graphical model Formally, Bayesian networks are DAGs whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses.
    [Show full text]
  • An Introduction to Factor Graphs
    An Introduction to Factor Graphs Hans-Andrea Loeliger MLSB 2008, Copenhagen 1 Definition A factor graph represents the factorization of a function of several variables. We use Forney-style factor graphs (Forney, 2001). Example: f(x1, x2, x3, x4, x5) = fA(x1, x2, x3) · fB(x3, x4, x5) · fC(x4). fA fB x1 x3 x5 x2 x4 fC Rules: • A node for every factor. • An edge or half-edge for every variable. • Node g is connected to edge x iff variable x appears in factor g. (What if some variable appears in more than 2 factors?) 2 Example: Markov Chain pXYZ(x, y, z) = pX(x) pY |X(y|x) pZ|Y (z|y). X Y Z pX pY |X pZ|Y We will often use capital letters for the variables. (Why?) Further examples will come later. 3 Message Passing Algorithms operate by passing messages along the edges of a factor graph: - - - 6 6 ? ? - - - - - ... 6 6 ? ? 6 6 ? ? 4 A main point of factor graphs (and similar graphical notations): A Unified View of Historically Different Things Statistical physics: - Markov random fields (Ising 1925) Signal processing: - linear state-space models and Kalman filtering: Kalman 1960. - recursive least-squares adaptive filters - Hidden Markov models: Baum et al. 1966. - unification: Levy et al. 1996. Error correcting codes: - Low-density parity check codes: Gallager 1962; Tanner 1981; MacKay 1996; Luby et al. 1998. - Convolutional codes and Viterbi decoding: Forney 1973. - Turbo codes: Berrou et al. 1993. Machine learning, statistics: - Bayesian networks: Pearl 1988; Shachter 1988; Lauritzen and Spiegelhalter 1988; Shafer and Shenoy 1990. 5 Other Notation Systems for Graphical Models Example: p(u, w, x, y, z) = p(u)p(w)p(x|u, w)p(y|x)p(z|x).
    [Show full text]
  • Graphical Models for Discrete and Continuous Data Arxiv:1609.05551V3
    Graphical Models for Discrete and Continuous Data Rui Zhuang Department of Biostatistics, University of Washington and Noah Simon Department of Biostatistics, University of Washington and Johannes Lederer Department of Mathematics, Ruhr-University Bochum June 18, 2019 Abstract We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihood estimator, and we apply this pipeline to learn simultaneous neural activities from spike data. Keywords: Non-Gaussian Data, Graphical Models, Maximum Likelihood Estimation arXiv:1609.05551v3 [math.ST] 15 Jun 2019 1 Introduction Gaussian graphical models (Drton & Maathuis 2016, Lauritzen 1996, Wainwright & Jordan 2008) describe the dependence structures in normally distributed random vectors. These models have become increasingly popular in the sciences, because their representation of the dependencies is lucid and can be readily estimated. For a brief overview, consider a 1 random vector X 2 Rp that follows a centered normal distribution with density 1 −x>Σ−1x=2 fΣ(x) = e (1) (2π)p=2pjΣj with respect to Lebesgue measure, where the population covariance Σ 2 Rp×p is a symmetric and positive definite matrix. Gaussian graphical models associate these densities with a graph (V; E) that has vertex set V := f1; : : : ; pg and edge set E := f(i; j): i; j 2 −1 f1; : : : ; pg; i 6= j; Σij 6= 0g: The graph encodes the dependence structure of X in the sense that any two entries Xi;Xj; i 6= j; are conditionally independent given all other entries if and only if (i; j) 2= E.
    [Show full text]
  • Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations
    Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 {aychoi,darwiche}@cs.ucla.edu Abstract instances of the satisfiability problem (Braunstein, Mezard,´ and Zecchina 2005). IBP has also been applied towards a We consider the problem of computing mutual informa- variety of computer vision tasks (Szeliski et al. 2006), par- tion between many pairs of variables in a Bayesian net- work. This task is relevant to a new class of Generalized ticularly stereo vision, where methods based on belief prop- Belief Propagation (GBP) algorithms that characterizes agation have been particularly successful. Iterative Belief Propagation (IBP) as a polytree approx- The successes of IBP as an approximate inference algo- imation found by deleting edges in a Bayesian network. rithm spawned a number of generalizations, in the hopes of By computing, in the simplified network, the mutual in- understanding the nature of IBP, and further to improve on formation between variables across a deleted edge, we the quality of its approximations. Among the most notable can estimate the impact that recovering the edge might are the family of Generalized Belief Propagation (GBP) al- have on the approximation. Unfortunately, it is compu- gorithms (Yedidia, Freeman, and Weiss 2005), where the tationally impractical to compute such scores for net- works over many variables having large state spaces. “structure” of an approximation is given by an auxiliary So that edge recovery can scale to such networks, we graph called a region graph, which specifies (roughly) which propose in this paper an approximation of mutual in- regions of the original network should be handled exactly, formation which is based on a soft extension of d- and to what extent different regions should be mutually con- separation (a graphical test of independence in Bayesian sistent.
    [Show full text]
  • Graphical Models
    Graphical Models Michael I. Jordan Computer Science Division and Department of Statistics University of California, Berkeley 94720 Abstract Statistical applications in fields such as bioinformatics, information retrieval, speech processing, im- age processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing. Keywords: Probabilistic graphical models; junction tree algorithm; sum-product algorithm; Markov chain Monte Carlo; variational inference; bioinformatics; error-control coding. 1. Introduction The fields of Statistics and Computer Science have generally followed separate paths for the past several decades, which each field providing useful services to the other, but with the core concerns of the two fields rarely appearing to intersect. In recent years, however, it has become increasingly evident that the long-term goals of the two fields are closely aligned. Statisticians are increasingly concerned with the computational aspects, both theoretical and practical, of models and inference procedures. Computer scientists are increasingly concerned with systems that interact with the external world and interpret uncertain data in terms of underlying probabilistic models. One area in which these trends are most evident is that of probabilistic graphical models.
    [Show full text]
  • Variational Message Passing
    Journal of Machine Learning Research 6 (2005) 661–694 Submitted 2/04; Revised 11/04; Published 4/05 Variational Message Passing John Winn [email protected] Christopher M. Bishop [email protected] Microsoft Research Cambridge Roger Needham Building 7 J. J. Thomson Avenue Cambridge CB3 0FB, U.K. Editor: Tommi Jaakkola Abstract Bayesian inference is now widely established as one of the principal foundations for machine learn- ing. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called varia- tional inference. In this paper we introduce Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to Bayesian Networks. Like belief propagation, VMP proceeds by sending messages between nodes in the network and updating posterior beliefs us- ing local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugate-exponential models because it uses a factorised variational approx- imation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing non-conjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational message passing has been implemented in the form of a general purpose inference en- gine called VIBES (‘Variational Inference for BayEsian networkS’) which allows models to be specified graphically and then solved variationally without recourse to coding.
    [Show full text]
  • Linearized and Single-Pass Belief Propagation
    Linearized and Single-Pass Belief Propagation Wolfgang Gatterbauer Stephan Gunnemann¨ Danai Koutra Christos Faloutsos Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] [email protected] [email protected] ABSTRACT DR TS HAF D 0.8 0.2 T 0.3 0.7 H 0.6 0.3 0.1 How can we tell when accounts are fake or real in a social R 0.2 0.8 S 0.7 0.3 A 0.3 0.0 0.7 network? And how can we tell which accounts belong to F 0.1 0.7 0.2 liberal, conservative or centrist users? Often, we can answer (a) homophily (b) heterophily (c) general case such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of ho- Figure 1: Three types of network effects with example coupling mophily (\birds of a feather flock together") or heterophily matrices. Shading intensity corresponds to the affinities or cou- (\opposites attract"). One of the most widely used methods pling strengths between classes of neighboring nodes. (a): D: for this kind of inference is Belief Propagation (BP) which Democrats, R: Republicans. (b): T: Talkative, S: Silent. (c): H: iteratively propagates the information from a few nodes with Honest, A: Accomplice, F: Fraudster. explicit labels throughout a network until convergence. A well-known problem with BP, however, is that there are no or heterophily applies in a given scenario, we can usually give known exact guarantees of convergence in graphs with loops.
    [Show full text]
  • Conditional Random Fields
    Conditional Random Fields Grant “Tractable Inference” Van Horn and Milan “No Marginal” Cvitkovic Recap: Discriminative vs. Generative Models Suppose we have a dataset drawn from Generative models try to learn Hard problem; always requires assumptions Captures all the nuance of the data distribution (modulo our assumptions) Discriminative models try to learn Simpler problem; often all we care about Typically doesn’t let us make useful modeling assumptions about Recap: Graphical Models One way to make generative modeling tractable is to make simplifying assumptions about which variables affect which other variables. These assumptions can often be represented graphically, leading to a whole zoo of models known collectively as Graphical Models. Recap: Graphical Models One zoo-member is the Bayesian Network, which you’ve met before: Naive Bayes HMM The vertices in a Bayes net are random variables. An edge from A to B means roughly: “A causes B” Or more formally: “B is independent of all other vertices when conditioned on its parents”. Figures from Sutton and McCallum, ‘12 Recap: Graphical Models Another is the Markov Random Field (MRF), an undirected graphical model Again vertices are random variables. Edges show which variables depend on each other: Thm: This implies that Where the are called “factors”. They measure how compatible an assignment of subset of random variables are. Figure from Wikipedia Recap: Graphical Models Then there are Factor Graphs, which are just MRFs with bonus vertices to remove any ambiguity about exactly how the MRF factors An MRF An MRF A factor graph for this MRF Another factor graph for this MRF Circles and squares are both vertices; circles are random variables, squares are factors.
    [Show full text]
  • Α Belief Propagation for Approximate Inference
    α Belief Propagation for Approximate Inference Dong Liu, Minh Thanh` Vu, Zuxing Li, and Lars K. Rasmussen KTH Royal Institute of Technology, Stockholm, Sweden. e-mail: fdoli, mtvu, zuxing, [email protected] Abstract Belief propagation (BP) algorithm is a widely used message-passing method for inference in graphical models. BP on loop-free graphs converges in linear time. But for graphs with loops, BP’s performance is uncertain, and the understanding of its solution is limited. To gain a better understanding of BP in general graphs, we derive an interpretable belief propagation algorithm that is motivated by minimization of a localized α-divergence. We term this algorithm as α belief propagation (α-BP). It turns out that α-BP generalizes standard BP. In addition, this work studies the convergence properties of α-BP. We prove and offer the convergence conditions for α-BP. Experimental simulations on random graphs validate our theoretical results. The application of α-BP to practical problems is also demonstrated. I. INTRODUCTION Bayesian inference provides a general mathematical framework for many learning tasks such as classification, denoising, object detection, and signal detection. The wide applications include but not limited to imaging processing [34], multi-input-multi-output (MIMO) signal detection in digital communication [2], [10], inference on structured lattice [5], machine learning [19], [14], [33]. Generally speaking, a core problem to these applications is the inference on statistical properties of a (hidden) variable x = (x1; : : : ; xN ) that usually can not be observed. Specifically, practical interests usually include computing the most probable state x given joint probability p(x), or marginal probability p(xc), where xc is subset of x.
    [Show full text]
  • Graphical Modelling of Multivariate Spatial Point Processes
    Graphical modelling of multivariate spatial point processes Matthias Eckardt Department of Computer Science, Humboldt Universit¨at zu Berlin, Berlin, Germany July 26, 2016 Abstract This paper proposes a novel graphical model, termed the spatial dependence graph model, which captures the global dependence structure of different events that occur randomly in space. In the spatial dependence graph model, the edge set is identified by using the conditional partial spectral coherence. Thereby, nodes are related to the components of a multivariate spatial point process and edges express orthogo- nality relation between the single components. This paper introduces an efficient approach towards pattern analysis of highly structured and high dimensional spatial point processes. Unlike all previous methods, our new model permits the simultane- ous analysis of all multivariate conditional interrelations. The potential of our new technique to investigate multivariate structural relations is illustrated using data on forest stands in Lansing Woods as well as monthly data on crimes committed in the City of London. Keywords: Crime data, Dependence structure; Graphical model; Lansing Woods, Multi- variate spatial point pattern arXiv:1607.07083v1 [stat.ME] 24 Jul 2016 1 1 Introduction The analysis of spatial point patterns is a rapidly developing field and of particular interest to many disciplines. Here, a main concern is to explore the structures and relations gener- ated by a countable set of randomly occurring points in some bounded planar observation window. Generally, these randomly occurring points could be of one type (univariate) or of two and more types (multivariate). In this paper, we consider the latter type of spatial point patterns.
    [Show full text]
  • Decomposable Principal Component Analysis Ami Wiesel, Member, IEEE, and Alfred O
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 11, NOVEMBER 2009 4369 Decomposable Principal Component Analysis Ami Wiesel, Member, IEEE, and Alfred O. Hero, Fellow, IEEE Abstract—In this paper, we consider principal component anal- computer networks [6], [12], [15]. In particular, [9] and [18] ysis (PCA) in decomposable Gaussian graphical models. We exploit proposed to approximate the global PCA using a sequence of the prior information in these models in order to distribute PCA conditional local PCA solutions. Alternatively, an approximate computation. For this purpose, we reformulate the PCA problem in the sparse inverse covariance (concentration) domain and ad- solution which allows a tradeoff between performance and com- dress the global eigenvalue problem by solving a sequence of local munication requirements was proposed in [12] using eigenvalue eigenvalue problems in each of the cliques of the decomposable perturbation theory. graph. We illustrate our methodology in the context of decentral- DPCA is an efficient implementation of distributed PCA ized anomaly detection in the Abilene backbone network. Based on the topology of the network, we propose an approximate statistical based on a prior graphical model. Unlike the above references graphical model and distribute the computation of PCA. it does not try to approximate PCA, but yields an exact solution Index Terms—Anomaly detection, graphical models, principal up to on any given error tolerance. DPCA assumes additional component analysis. prior knowledge in the form of a graphical model that is not taken into account by previous distributed PCA methods. Such models are very common in signal processing and communi- I. INTRODUCTION cations.
    [Show full text]