Gaussian Graphical Models Arguable Points

Total Page:16

File Type:pdf, Size:1020Kb

Gaussian Graphical Models Arguable Points Gaussian Graphical Models arguable points Gaussian Graphical Models Jiang Guo Junuary 9th, 2013 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Contents 1 Gaussian Graphical Models Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Sparsity Graphical Lasso CLIME TIGER Greedy methods Measure methods non-gaussian scenario Applications Project 2 arguable points Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Undirected Graphical Model Markov Property pairwise Xu ?? XvjXV nfu;vg if fu; vg 2= E local global Conditional Independence Partial Correlation Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model Multivariate Gaussian X ∼ Nd(ξ; Σ) If Σ is positive definite, distribution has density on Rd T f(xjξ; Σ) = (2π)−d=2(detΩ)1=2e−(x−ξ) Ω(x−ξ)=2 where Ω = Σ−1 is the Precision matrix of the distribution. Marginal distribution: Xs ∼ Nd0 (ξs; Σs;s) Conditional distribution: X1jX2 ∼ N (ξ1j2; Σ1j2) −1 where: ξ1j2 = ξ1 + Σ12Σ22 (x2 − ξ2) −1 Σ1j2 = Σ11 − Σ12Σ22 Σ21 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model Multivariate Gaussian Sample Covariance Matrix n 1 X Σ~ = (x − ξ)(x − ξ)T n − 1 i i i=1 Precision Matrix Ω = Σ−1 In high dimensional settings where p n, the Σ~ is not invertible (semi-positive definite). Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Gaussian Graphical Model Every Multivariate Gaussian distribution can be represented by a pairwise Gaussian Markov Random Field (GMRF) GMRF: Undirected graph G = (V; E) with vertex set V = f1; :::; pg corresponding to random variables edge set E = f(i; j) 2 V ji 6= j; Ωij 6= 0g Goal: Estimate sparse graph structuregiven n p iid observations. Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Precision matrix estimation Graph recovery also known as "Graph structure learning/estimation" For each pair of nodes (variables), decide whether there should be an edge Edge(α, β) not exists , α ?? βjV nfα; βg , Ωα,β = 0 Precision matrix is sparse Hence, it turns out to be a non-zero patterns detection problem. x y z w x 0101 y 1001 z 0 0 0 0 w 1100 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Precision matrix estimation Parameter Estimation x y z w x 0?0? y ?00? z 0 0 0 0 w ??00 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity The word "sparsity" has (at least) four related meanings in NLP! (Noah Smith et al.) 1 Data sparsity: N is too small to obtain a good estimate for w. (usually bad) 2 "Probability" sparsity: most of events receive zero probability 3 Sparsity in the dual: Associated with SVM and other kernel-based methods. 4 Model sparsity: Most dimensions of f is not needed for a good hw; those dimensions of w can be zero, leading to a sparse w (model) We focus on sense #4. Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Linear regression d X f(~x) = w0 + wi ∗ xi i=1 sparsity means some of the wi(s) are zero 1 problem 1: why do we need sparse solution? feature/variable selection better interprete the data shrinkage the size of model computatioal savings discourage overfitting 2 problem 2: how to achieve sparse solution? solutions to come... Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Ordinary Least Square Objective function to minimize: N X 2 L = jyi − f(xi)j i=1 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Ridge: L2 norm Regularization N X λ min L = jy − f(x )j2 + k~wk2 i i 2 2 i=1 equaivalent form (constrained optimization): N X 2 min L = jyi − f(xi)j i=1 2 subject to k~wk2 6 C Corresponds to zero-mean Gaussian prior ~w ∼ N (0; σ2), i.e. 2 p(wi) / exp(−λkwk2) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Sparsity Lasso: L1 norm Regularization N X λ min L = jy − f(x )j2 + k~wk i i 2 1 i=1 equaivalent form (constrained optimization): N X 2 min L = jyi − f(xi)j i=1 subject to k~wk1 6 C Corresponds to zero-mean Laplace prior ~w ∼ Laplace(0; b), i.e. p(wi) / exp(−λjwij) sparse solution Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Why lasso sparse? an intuitive interpretation Lasso Ridge Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Algorithms for the Lasso Subgradient methods interior-point methods (Boyd et al 2007) Least Angle RegreSsion(LARS) (Efron et al 2004), computes entire path of solutions. State of the Art until 2008. Pathwise Coordinate Descent (Friedman, Hastie et al 2007) Proximal Gradient (project gradient) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Pathwise Coordinate Descent for the Lasso Coordinate descent: optimize one parameter (coordinate) at a time. How? suppose we had only one predictor, problem is to minimize X 2 (γi − xiβ) + λjβj i Solution is the soft-thresholded estimate ^ ^ sign(β)(jβj − λ)+ where β^ is the least square solution. and 8 9 < z − λ if z > 0 and λ < jzj = sign(z)(jzj − λ)+ = z + λ if z < 0 and λ < jzj : 0 if λ > jzj ; Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Pathwise Coordinate Descent for the Lasso With multiple predictors, cycle through each predictor in turn. We P ^ compute residuals γi = yi − j6=k xijβk and apply univariate soft-thresholding, pretending our data is (xij; ri) Start with large value for λ (high sparsity) and slowly decrease it. Exploits current estimation as warm start, leading to a more stable solution. Jiang Guo Gaussian Graphical Models Subset selection: L0 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 0 i=1 Greedy forward Greedy backward Greedy forward-backward (Tong Zhang, 2009 and 2011) Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Variable selection Lasso: L1 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 1 i=1 Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Variable selection Lasso: L1 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 1 i=1 Subset selection: L0 regularization N X λ L = jy − f(x )j2 + k~wk i i 2 0 i=1 Greedy forward Greedy backward Greedy forward-backward (Tong Zhang, 2009 and 2011) Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Greedy forward-backward algorithm Jiang Guo Gaussian Graphical Models Gaussian Graphical Models arguable points Undirected Graphical Model Gaussian Graphical Model Precision matrix estimation Main approaches Measure methods non-gaussian scenario Applications Project Graphical Lasso Recall the Gaussian graphical model(multi-variate gaussian) T f(xjξ; Σ) = (2π)−d=2(detΩ)1=2e−(x−ξ)
Recommended publications
  • Graphical Models for Discrete and Continuous Data Arxiv:1609.05551V3
    Graphical Models for Discrete and Continuous Data Rui Zhuang Department of Biostatistics, University of Washington and Noah Simon Department of Biostatistics, University of Washington and Johannes Lederer Department of Mathematics, Ruhr-University Bochum June 18, 2019 Abstract We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihood estimator, and we apply this pipeline to learn simultaneous neural activities from spike data. Keywords: Non-Gaussian Data, Graphical Models, Maximum Likelihood Estimation arXiv:1609.05551v3 [math.ST] 15 Jun 2019 1 Introduction Gaussian graphical models (Drton & Maathuis 2016, Lauritzen 1996, Wainwright & Jordan 2008) describe the dependence structures in normally distributed random vectors. These models have become increasingly popular in the sciences, because their representation of the dependencies is lucid and can be readily estimated. For a brief overview, consider a 1 random vector X 2 Rp that follows a centered normal distribution with density 1 −x>Σ−1x=2 fΣ(x) = e (1) (2π)p=2pjΣj with respect to Lebesgue measure, where the population covariance Σ 2 Rp×p is a symmetric and positive definite matrix. Gaussian graphical models associate these densities with a graph (V; E) that has vertex set V := f1; : : : ; pg and edge set E := f(i; j): i; j 2 −1 f1; : : : ; pg; i 6= j; Σij 6= 0g: The graph encodes the dependence structure of X in the sense that any two entries Xi;Xj; i 6= j; are conditionally independent given all other entries if and only if (i; j) 2= E.
    [Show full text]
  • Graphical Models
    Graphical Models Michael I. Jordan Computer Science Division and Department of Statistics University of California, Berkeley 94720 Abstract Statistical applications in fields such as bioinformatics, information retrieval, speech processing, im- age processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing. Keywords: Probabilistic graphical models; junction tree algorithm; sum-product algorithm; Markov chain Monte Carlo; variational inference; bioinformatics; error-control coding. 1. Introduction The fields of Statistics and Computer Science have generally followed separate paths for the past several decades, which each field providing useful services to the other, but with the core concerns of the two fields rarely appearing to intersect. In recent years, however, it has become increasingly evident that the long-term goals of the two fields are closely aligned. Statisticians are increasingly concerned with the computational aspects, both theoretical and practical, of models and inference procedures. Computer scientists are increasingly concerned with systems that interact with the external world and interpret uncertain data in terms of underlying probabilistic models. One area in which these trends are most evident is that of probabilistic graphical models.
    [Show full text]
  • Graphical Modelling of Multivariate Spatial Point Processes
    Graphical modelling of multivariate spatial point processes Matthias Eckardt Department of Computer Science, Humboldt Universit¨at zu Berlin, Berlin, Germany July 26, 2016 Abstract This paper proposes a novel graphical model, termed the spatial dependence graph model, which captures the global dependence structure of different events that occur randomly in space. In the spatial dependence graph model, the edge set is identified by using the conditional partial spectral coherence. Thereby, nodes are related to the components of a multivariate spatial point process and edges express orthogo- nality relation between the single components. This paper introduces an efficient approach towards pattern analysis of highly structured and high dimensional spatial point processes. Unlike all previous methods, our new model permits the simultane- ous analysis of all multivariate conditional interrelations. The potential of our new technique to investigate multivariate structural relations is illustrated using data on forest stands in Lansing Woods as well as monthly data on crimes committed in the City of London. Keywords: Crime data, Dependence structure; Graphical model; Lansing Woods, Multi- variate spatial point pattern arXiv:1607.07083v1 [stat.ME] 24 Jul 2016 1 1 Introduction The analysis of spatial point patterns is a rapidly developing field and of particular interest to many disciplines. Here, a main concern is to explore the structures and relations gener- ated by a countable set of randomly occurring points in some bounded planar observation window. Generally, these randomly occurring points could be of one type (univariate) or of two and more types (multivariate). In this paper, we consider the latter type of spatial point patterns.
    [Show full text]
  • Decomposable Principal Component Analysis Ami Wiesel, Member, IEEE, and Alfred O
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 57, NO. 11, NOVEMBER 2009 4369 Decomposable Principal Component Analysis Ami Wiesel, Member, IEEE, and Alfred O. Hero, Fellow, IEEE Abstract—In this paper, we consider principal component anal- computer networks [6], [12], [15]. In particular, [9] and [18] ysis (PCA) in decomposable Gaussian graphical models. We exploit proposed to approximate the global PCA using a sequence of the prior information in these models in order to distribute PCA conditional local PCA solutions. Alternatively, an approximate computation. For this purpose, we reformulate the PCA problem in the sparse inverse covariance (concentration) domain and ad- solution which allows a tradeoff between performance and com- dress the global eigenvalue problem by solving a sequence of local munication requirements was proposed in [12] using eigenvalue eigenvalue problems in each of the cliques of the decomposable perturbation theory. graph. We illustrate our methodology in the context of decentral- DPCA is an efficient implementation of distributed PCA ized anomaly detection in the Abilene backbone network. Based on the topology of the network, we propose an approximate statistical based on a prior graphical model. Unlike the above references graphical model and distribute the computation of PCA. it does not try to approximate PCA, but yields an exact solution Index Terms—Anomaly detection, graphical models, principal up to on any given error tolerance. DPCA assumes additional component analysis. prior knowledge in the form of a graphical model that is not taken into account by previous distributed PCA methods. Such models are very common in signal processing and communi- I. INTRODUCTION cations.
    [Show full text]
  • A Probabilistic Interpretation of Canonical Correlation Analysis
    A Probabilistic Interpretation of Canonical Correlation Analysis Francis R. Bach Michael I. Jordan Computer Science Division Computer Science Division University of California and Department of Statistics Berkeley, CA 94114, USA University of California [email protected] Berkeley, CA 94114, USA [email protected] April 21, 2005 Technical Report 688 Department of Statistics University of California, Berkeley Abstract We give a probabilistic interpretation of canonical correlation (CCA) analysis as a latent variable model for two Gaussian random vectors. Our interpretation is similar to the proba- bilistic interpretation of principal component analysis (Tipping and Bishop, 1999, Roweis, 1998). In addition, we cast Fisher linear discriminant analysis (LDA) within the CCA framework. 1 Introduction Data analysis tools such as principal component analysis (PCA), linear discriminant analysis (LDA) and canonical correlation analysis (CCA) are widely used for purposes such as dimensionality re- duction or visualization (Hotelling, 1936, Anderson, 1984, Hastie et al., 2001). In this paper, we provide a probabilistic interpretation of CCA and LDA. Such a probabilistic interpretation deepens the understanding of CCA and LDA as model-based methods, enables the use of local CCA mod- els as components of a larger probabilistic model, and suggests generalizations to members of the exponential family other than the Gaussian distribution. In Section 2, we review the probabilistic interpretation of PCA, while in Section 3, we present the probabilistic interpretation of CCA and LDA, with proofs presented in Section 4. In Section 5, we provide a CCA-based probabilistic interpretation of LDA. 1 z x Figure 1: Graphical model for factor analysis. 2 Review: probabilistic interpretation of PCA Tipping and Bishop (1999) have shown that PCA can be seen as the maximum likelihood solution of a factor analysis model with isotropic covariance matrix.
    [Show full text]
  • Canonical Correlation Analysis and Graphical Modeling for Huaman
    Canonical Autocorrelation Analysis and Graphical Modeling for Human Trafficking Characterization Qicong Chen Maria De Arteaga Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA 15213 Pittsburgh, PA 15213 [email protected] [email protected] William Herlands Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract The present research characterizes online prostitution advertisements by human trafficking rings to extract and quantify patterns that describe their online oper- ations. We approach this descriptive analytics task from two perspectives. One, we develop an extension to Sparse Canonical Correlation Analysis that identifies autocorrelations within a single set of variables. This technique, which we call Canonical Autocorrelation Analysis, detects features of the human trafficking ad- vertisements which are highly correlated within a particular trafficking ring. Two, we use a variant of supervised latent Dirichlet allocation to learn topic models over the trafficking rings. The relationship of the topics over multiple rings character- izes general behaviors of human traffickers as well as behaviours of individual rings. 1 Introduction Currently, the United Nations Office on Drugs and Crime estimates there are 2.5 million victims of human trafficking in the world, with 79% of them suffering sexual exploitation. Increasingly, traffickers use the Internet to advertise their victims’ services and establish contact with clients. The present reserach chacterizes online advertisements by particular human traffickers (or trafficking rings) to develop a quantitative descriptions of their online operations. As such, prediction is not the main objective. This is a task of descriptive analytics, where the objective is to extract and quantify patterns that are difficult for humans to find.
    [Show full text]
  • Sequential Mean Field Variational Analysis
    Computer Vision and Image Understanding 101 (2006) 87–99 www.elsevier.com/locate/cviu Sequential mean field variational analysis of structured deformable shapes Gang Hua *, Ying Wu Department of Electrical and Computer Engineering, 2145 Sheridan Road, Northwestern University, Evanston, IL 60208, USA Received 15 March 2004; accepted 25 July 2005 Available online 3 October 2005 Abstract A novel approach is proposed to analyzing and tracking the motion of structured deformable shapes, which consist of multiple cor- related deformable subparts. Since this problem is high dimensional in nature, existing methods are plagued either by the inability to capture the detailed local deformation or by the enormous complexity induced by the curse of dimensionality. Taking advantage of the structured constraints of the different deformable subparts, we propose a new statistical representation, i.e., the Markov network, to structured deformable shapes. Then, the deformation of the structured deformable shapes is modelled by a dynamic Markov network which is proven to be very efficient in overcoming the challenges induced by the high dimensionality. Probabilistic variational analysis of this dynamic Markov model reveals a set of fixed point equations, i.e., the sequential mean field equations, which manifest the interac- tions among the motion posteriors of different deformable subparts. Therefore, we achieve an efficient solution to such a high-dimen- sional motion analysis problem. Combined with a Monte Carlo strategy, the new algorithm, namely sequential mean field Monte Carlo, achieves very efficient Bayesian inference of the structured deformation with close-to-linear complexity. Extensive experiments on tracking human lips and human faces demonstrate the effectiveness and efficiency of the proposed method.
    [Show full text]
  • Simultaneous Clustering and Estimation of Heterogeneous Graphical Models
    Journal of Machine Learning Research 18 (2018) 1-58 Submitted 1/17; Revised 11/17; Published 4/18 Simultaneous Clustering and Estimation of Heterogeneous Graphical Models Botao Hao [email protected] Department of Statistics Purdue University West Lafayette, IN 47906, USA Will Wei Sun [email protected] Department of Management Science University of Miami School of Business Administration Miami, FL 33146, USA Yufeng Liu [email protected] Department of Statistics and Operations Research Department of Genetics Department of Biostatistics Carolina Center for Genome Sciences Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA Guang Cheng [email protected] Department of Statistics Purdue University West Lafayette, IN 47906, USA Editor: Koji Tsuda Abstract We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations. Unlike most previous approaches which assume that the cluster structure is given in advance, an appealing feature of our method is to learn cluster structure while estimating heterogeneous graphical models. This is achieved via a high di- mensional version of Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin, 1993). A joint graphical lasso penalty is imposed on the conditional maximiza- tion step to extract both homogeneity and heterogeneity components across all clusters. Our algorithm is computationally efficient due to fast sparse learning routines and can be implemented without unsupervised learning knowledge. The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statisti- cal accuracy) and optimization error (computational complexity).
    [Show full text]
  • The Multiple Quantile Graphical Model
    The Multiple Quantile Graphical Model Alnur Ali J. Zico Kolter Ryan J. Tibshirani Machine Learning Department Computer Science Department Department of Statistics Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] [email protected] Abstract We introduce the Multiple Quantile Graphical Model (MQGM), which extends the neighborhood selection approach of Meinshausen and Bühlmann for learning sparse graphical models. The latter is defined by the basic subproblem of model- ing the conditional mean of one variable as a sparse function of all others. Our approach models a set of conditional quantiles of one variable as a sparse function of all others, and hence offers a much richer, more expressive class of conditional distribution estimates. We establish that, under suitable regularity conditions, the MQGM identifies the exact conditional independencies with probability tending to one as the problem size grows, even outside of the usual homoskedastic Gaussian data model. We develop an efficient algorithm for fitting the MQGM using the alternating direction method of multipliers. We also describe a strategy for sam- pling from the joint distribution that underlies the MQGM estimate. Lastly, we present detailed experiments that demonstrate the flexibility and effectiveness of the MQGM in modeling hetereoskedastic non-Gaussian data. 1 Introduction We consider modeling the joint distribution Pr(y1; : : : ; yd) of d random variables, given n indepen- dent draws from this distribution y(1); : : : ; y(n) 2 Rd, where possibly d n. Later, we generalize this setup and consider modeling the conditional distribution Pr(y1; : : : ; ydjx1; : : : ; xp), given n independent pairs (x(1); y(1));:::; (x(n); y(n)) 2 Rp+d.
    [Show full text]
  • Probabilistic Graphical Model Explanations for Graph Neural Networks
    PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks Minh N. Vu My T. Thai University of Florida University of Florida Gainesville, FL 32611 Gainesville, FL 32611 [email protected] [email protected] Abstract In Graph Neural Networks (GNNs), the graph structure is incorporated into the learning of node representations. This complex structure makes explaining GNNs’ predictions become much more challenging. In this paper, we propose PGM- Explainer, a Probabilistic Graphical Model (PGM) model-agnostic explainer for GNNs. Given a prediction to be explained, PGM-Explainer identifies crucial graph components and generates an explanation in form of a PGM approximating that prediction. Different from existing explainers for GNNs where the explanations are drawn from a set of linear functions of explained features, PGM-Explainer is able to demonstrate the dependencies of explained features in form of conditional probabilities. Our theoretical analysis shows that the PGM generated by PGM- Explainer includes the Markov-blanket of the target prediction, i.e. including all its statistical information. We also show that the explanation returned by PGM- Explainer contains the same set of independence statements in the perfect map. Our experiments on both synthetic and real-world datasets show that PGM-Explainer achieves better performance than existing explainers in many benchmark tasks. 1 Introduction Graph Neural Networks (GNNs) have been emerging as powerful solutions to many real-world applications in various domains where the datasets are in form of graphs such as social networks, citation networks, knowledge graphs, and biological networks [1, 2, 3]. Many GNN’s architectures with high predictive performance should be mentioned are ChebNets [4], Graph Convolutional Networks [5], GraphSage [6], Graph Attention Networks [7], among others [8, 9, 10, 11, 12, 13].
    [Show full text]
  • High Dimensional Semiparametric Latent Graphical Model for Mixed Data
    J. R. Statist. Soc. B (2017) 79, Part 2, pp. 405–421 High dimensional semiparametric latent graphical model for mixed data Jianqing Fan, Han Liu and Yang Ning Princeton University, USA and Hui Zou University of Minnesota, Minneapolis, USA [Received April 2014. Final revision December 2015] Summary. We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent vari- ables that satisfy the Gaussian copula distribution. The goal is to infer the conditional indepen- dence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis. Keywords: Discrete data; Gaussian copula; Latent variable; Mixed data; Non-paranormal; Rank-based statistic 1. Introduction Graphical models (Lauritzen, 1996) have been widely used to explore the dependence structure of multivariate distributions, arising in many research areas including machine learning, image analysis, statistical physics and epidemiology. In these applications, the data that are collected often have high dimensionality and low sample size. Under this high dimensional setting, param- eter estimation and edge structure learning in the graphical model attract increasing attention in statistics.
    [Show full text]
  • High-Dimensional Inference for Cluster-Based Graphical Models
    Journal of Machine Learning Research 21 (2020) 1-55 Submitted 6/18; Revised 12/19; Published 3/20 High-Dimensional Inference for Cluster-Based Graphical Models Carson Eisenach [email protected] Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA Florentina Bunea [email protected] Yang Ning [email protected] Claudiu Dinicu [email protected] Department of Statistics and Data Science Cornell University Ithaca, NY 14850, USA Editor: Nicolas Vayatis Abstract Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models, in which variable clustering is applied as an initial step for reducing the dimension of the feature space. We employ model assisted clustering, in which the clusters contain fea- tures that are similar to the same unobserved latent variable. Two different cluster-based Gaussian graphical models are considered: the latent variable graph, corresponding to the graphical model associated with the unobserved latent variables, and the cluster-average graph, corresponding to the vector of features averaged over clusters. Our study reveals that likelihood based inference for the latent graph, not analyzed previously, is analytically intractable. Our main contribution is the development and analysis of alternative estima- tion and inference strategies, for the precision matrix of an unobservable latent vector Z. We replace the likelihood of the data by an appropriate class of empirical risk functions, that can be specialized to the latent graphical model and to the simpler, but under-analyzed, cluster-average graphical model. The estimators thus derived can be used for inference on the graph structure, for instance on edge strength or pattern recovery.
    [Show full text]