Geometric Representations of Hypergraphs Applied to Prior Specification and Posterior Sampling
Total Page:16
File Type:pdf, Size:1020Kb
Geometric Representations of Hypergraphs Applied to Prior Specification and Posterior Sampling Sim´on Lunag´omez, Sayan Mukherjee and Robert L. Wolpert Sim´on Lunag´omez1, Sayan Mukherjee1,2 , and Robert L. Wolpert1,3 Department of Statistical Science1 Department of Computer Science2 Institute for Genome Sciences & Policy2 Nicholas School of the Environment3 Duke University Durham, NC 27708, USA e-mail: [email protected]; [email protected]; [email protected] Abstract: We formulate a novel approach to infer conditional indepen- dence models or Markov structure of a multivariate distribution. Specifi- cally, our objective is to place informative prior distributions over graphs (decomposable and unrestricted) and sample efficiently from the induced posterior distribution. We also explore the idea of factorizing according to complete sets of a graph; which implies working with a hypergraph that cannot be retrieved from the graph alone. The key idea we develop in this paper is a parametrization of hypergraphs using the geometry of points in Rm. This induces informative priors on graphs from speci- fied priors on finite sets of points. Constructing hypergraphs from finite point sets has been well studied in the fields of computational topology and random geometric graphs. We develop the framework underlying this idea and illustrate its efficacy using simulations. AMS 2000 subject classifications: Primary 60K35, 60K35; secondary 60K35. Keywords and phrases: Abstract simplicial complex, Computational topology, Factor models, Graphical models, Random geometric graphs. 1 imsart-generic ver. 2007/12/10 file: Filtration_2.tex date: July 29, 2009 S. Lunag´omez et al./Conditional Independence Models 2 1. Introduction It is common to model the joint probability distribution of a family of d ran- dom variables X ,...,X in two stages: First to specify the conditional { 1 d} dependence structure of the distribution, then to specify details of the con- ditional distributions of the variables within that structure [see p. 1274 of Dawid and Lauritzen 7, or p. 180 of Besag 3, for example]. The structure may be summarized in a variety of ways in the form of a graph = ( , ) whose G V E vertices = 1, ..., d index the variables X and whose edges V { } { i} E⊆V×V in some way encode conditional dependence. We follow the Hammersley- Clifford approach [2, 13], in which (i, j) if and only if the conditional ∈ E distribution of X given all other variables X : k = i depends on X , i.e., i { k 6 } j differs from the conditional distribution of X given X : k = i, j . In this i { k 6 } case the distribution is said to be Markov with respect to the graph. One can show that this graph is symmetric or undirected, i.e., all the elements of are unordered pairs. E Our primary goal is to infer this undirected graph from data, although our approach also yields estimates of the conditional distributions given the graph. The model space of undirected graphs grows quickly with the dimen- sion of the vector (there are 2d(d−1)/2 undirected graphs on d vertices) and is difficult to parametrize. We propose a novel parametrization and a simple, flexible family of prior distributions on and on Markov probability distri- G butions with respect to [7]; this parametrization is based on computing G the intersection pattern of a system of convex sets in Rm. The novelty and main contribution of this paper is structural inference for graphical models, specifically, the proposed representation of graph spaces allows for flexible priors and new MCMC algorithms. Simultaneous inference of a decomposable graph and marginals in a fully Bayesian framework was developed in [12] using local proposals to sample graph space. A promising extension of this approach called Shotgun Stochas- tic Search (SSS) takes advantage of parallel computing to select from a batch imsart-generic ver. 2007/12/10 file: Filtration_2.tex date: July 29, 2009 S. Lunag´omez et al./Conditional Independence Models 3 of local moves [16]. A stochastic search method that incorporates local moves as well as more aggressive moves in graph space has been developed by Scott and Carvalho [28]. These stochastic search methods are intended to identify regions with high posterior probability, but their convergence properties are still not well understood. Bayesian models for non-decomposable graphs have been proposed by Roverato [27] and Wong, Carter and Kohn [29]. These two approaches focus on Monte Carlo sampling of the posterior distribution from appropriate hyper Markov prior laws. Their emphasis is on the problem of Monte Carlo simulation, not on that of constructing interesting informative priors on graphs; we think there is need for methodology that compromises between efficient exploration of the model space and the task of designing simple and flexible distributions on graphs able to encode meaningful prior information. d Erd¨os-R´enyi random graphs (those in which each of the 2 possible edges (i, j) is included in independently with some specified probability p E ∈ [0, 1]), and variations where the edge inclusion probabilities pij are allowed to be edge-specific, have been used to place informative priors on decomposable graphs [14, 18]. The number of parameters in this prior specification can be enormous if the inclusion probabilities are allowed to vary, and some interesting features of graphs may be hard to express solely by specifying edge probabilities. Mukherjee and Speed [19] developed methodology for placing informative distributions on directed graphs by using what they call concordance functions; this is a function that is increasing as the graph agrees more with a given feature. This is a conceptually sound approach, however it is still not clear how to encode certain assumptions within such framework e.g. to work on an unrestricted graph space while encouraging decomposability. For the special case of jointly Gaussian variables X , or those with { j} arbitrary marginal distributions F ( ) whose dependence is adequately rep- j · −1 resented in the form Xj = Fj Φ(Zj) for jointly Gaussian Zj with zero { } imsart-generic ver. 2007/12/10 file: Filtration_2.tex date: July 29, 2009 S. Lunag´omez et al./Conditional Independence Models 4 mean and unit-diagonal covariance matrix Cij, the problem of studying con- ditional independence reduces to a search for zeros in the precision matrix C−1; this approach [see 15, for example] is faster and easier to implement than ours in cases where both are applicable, but is far more limited in the range of dependencies it allows. For example, consider a factorization with respect to the complete sets of a graph; using a potential function for a complete set of dimension greater or equal than 3 may result in a differ- ent dependence structure from using a potential function for each edge (we discuss this in Section 5.3). In this paper we establish a novel approach to parametrize spaces of graphs. For any integers d, m N the geometrical configuration of a set ∈ v of d points in Euclidean space Rm can be used (see Section 2) to deter- { i} mine a graph = ( , ) on = 1, ..., d . Any prior distribution on point G V E V { } sets v induces a prior distribution on graphs, and sampling from the { i} posterior distribution of graphs is reduced to sampling from spatial configu- rations of point sets— a standard problem in spatial modeling. The relation between graphs and finite sets of points has arisen in the fields of computa- tional topology [8] and random geometric graphs [22]. From the former we borrow the idea of nerves, i.e.simplicial complexes computed from intersec- tion patterns of subsets of Rm; we have that their 1-skeletons (collection of 1 dimensional simplices) are geometric graphs. From the random geometric − graph approach we gain understanding about the induced distribution on graph features when making certain parameters of a geometric graph (or hypergraph) stochastic. 1.1. Graphical models We define a graph as a pair ( , ), where is a finite set of objects (the ver- G V E V tices) and is a collection of pairs of those objects (the edges). If all elements E of are unordered (ordered), the graph is said to be undirected (directed). E In this paper it will be assumed that all graphs are undirected, unless stated imsart-generic ver. 2007/12/10 file: Filtration_2.tex date: July 29, 2009 S. Lunag´omez et al./Conditional Independence Models 5 otherwise. A set is called complete if for every v , v there is W⊆V { i j}∈W an edge that connects them; a complete set that is maximal with respect to inclusion is said to be a clique. We denote by C ( ) and Q( ), respectively, G G the collection of complete sets and cliques of . A path between v , v G { i j}∈V is a sequence of edges and vertices of the form vk1 , e1, vk2 ,...,es−1, vks such that em = (vkm , vkm+1 ), vk1 = vi, and vks = vj; If vi = vj the path is called a cycle. A graph such that any pair of vertices can be joined by a unique path is called a tree. Let (A,B,S) be disjoint subsets of , if every path V that starts from an element of A and ends in an element of B contains an element of S then it is said that S separates A and B. A related concept to the idea of graph is the one of a hypergraph, denoted , which consists H of a vertex set and a collection of unordered subsets of (known as V K V hyperedges). A weak decomposition of a graph = ( , ) is a partition of into G V E V nonempty sets (A,B,S) such that 1.