Removing Phase Transitions from Gibbs Measures
Total Page:16
File Type:pdf, Size:1020Kb
Removing Phase Transitions from Gibbs Measures Ian E. Fellows Mark S. Handcock Fellows Statistics University of California, Los Angeles Abstract While the Gibbs measure has shown wide applicability as a general modeling framework, the presence of mul- Gibbs measures are a fundamental class of tiphase behavior has caused some practical difficulty. distributions for the analysis of high dimen- Nowhere is this more apparent than in the network sional data. Phase transitions, which are also science field, where the popular Exponential-family known as degeneracy in the network science Random Graph Model (ERGM) has been repeatedly literature, are an emergent property of these shown to suffer from model frailty due to the exis- models that well describe many physical sys- tence of multiphase parameter configurations. This is tems. However, the reach of the Gibbs mea- known as \model degeneracy" in the network science sure is now far outside the realm of physi- literature, and is perhaps the most fundamental chal- cal systems, and in many of these domains lenge facing ERGMs both in theory [Schweinberger, multiphase behavior is a nuisance. This nui- 2011, Handcock, 2003, Bhamidi et al., 2008, Chatter- sance often makes distribution fitting impos- jee et al., 2013] and practice [Robins et al., 2007, Gold- sible due to failure of the MCMC sampler, enberg et al., 2010]. and even when an MLE fit is possible, if the The contribution of this work is to introduce a new solution is near a phase transition point, the class of statistical models for high dimensional data plausibility of the fit can be highly question- that retains the flexibility of the Gibbs measure, with- able. We introduce a modification to the out the undesirable multiphase properties. The form Gibbs distribution that reduces the effects of this model class allows us to ensure that it is not of phase transitions, and with properly cho- multiphase at any parameter values, and that the like- sen hyper-parameters, provably removes all lihood puts significant probability mass around the multiphase behavior. We show that this new configuration that is being modeled. distribution is just as easy to fit via MCM- CMLE as the Gibbs measure, and provide Let X be a set of random variables with realization examples in the Ising model from statistical x and N be the sample space in which X may exist. physics and ERGMs from network science. A Gibbs measure is defined by the probability mass function 1 Introduction 1 P θ g (x) p (xjθ) = e k k k x 2 N gibbs Z(θ) The Gibbs measure is a class of probability distribu- where θ is a vector of d parameters, g is a vector valued P tions describing the configuration of a (usually high P i θigi(x) set of sufficient statistics, and Z(θ) = x e is dimensional) set of random variables. Initially it was the partition function assuring that the probabilities motivated primarily by statistical mechanics [Gibbs sum to unity. For notational convenience we will be et al., 1902], though in more recent decades it has assuming throughout that x is discrete. The continu- found widespread usage in many areas of statistics and ous case may be handled with ease by replacing sums machine learning, including computer vision [Li, 2012], over x by integrals. neuroscience [Stephens et al., 2011] and the modeling of social networks [Frank and Strauss, 1986] to name Since the likelihood is only a function of the sufficient a few. statistics, the probability of observing a set of statistics is th Proceedings of the 20 International Conference on Artifi- 1 P θkgk(x) cial Intelligence and Statistics (AISTATS) 2017, Fort Laud- pgibbs(g(x)jθ) = N(g(x)) e k x 2 N ; erdale, Florida, USA. JMLR: W&CP volume 54. Copy- Z(θ) right 2017 by the author(s). where N(g(x)) is the count of configurations with Removing Phase Transitions from Gibbs Measures statistics equal to g(x). disparate types of Gibbs measures [Geyer and Thomp- son, 1992, Hunter and Handcock, 2012, Descombes p is an exponential family distribution [Barndorff- gibbs et al., 1997]. Nielsen, 1978] with natural parameters θ. An alternate way to define the model is in terms of the mean value parameters, which are defined as the expected values 1.1 Example: The Ising Model of the sufficient statistics µ(θ) = E (g(X)). There θ A canonical example of a Gibbs measure is the Ising is a one-to-one relationship between the natural and [1925] model of ferromagnetism, which in this case we mean value parameters, so a model may be expressed will define over an n-by-n toroidal lattice. Each x in either terms [Barndorff-Nielsen, 1978]. ij may either have magnetic spin up or down (xij 2 One of the most attractive features of pgibbs as a class {−1; 1g 8 i; j 2 f1; :::; ng), and is influenced by its of general modeling distributions is that it satisfies the four neighbors x(i−1)j, x(i+1)j, xi(j−1), and xi(j+1). maximum entropy principle, in that among all possi- Here we allow the subscripts to wrap, such that a sub- ble probability distributions (p0) for X with identical script of 0 refers to index n and a subscript of n + 1 mean values (i.e. Ep0 (g(X)) = µ(θ)), pgibbs has the refers to index 1. The Ising probability model has largest entropy [Jaynes, 1957]. Because entropy is a two sufficient statistics. The first models the gen- measure of the informativeness of the distribution, the eral tendency for up or down spins and is defined P Gibbs measure may be thought of as the least infor- as g1(x) = i;j xij. The second is the number of mative distribution consistent with the mean value pa- neighbors sharing the same spin and is defined as P rameters. g2(x) = i;j xij(x(i+1)j + xi(j+1)). Performing likelihood-based inference on Gibbs mea- While the Ising model was developed in the context sures is a non-trivial task because for even moderately of statistical physics, the tendency of like to be con- multivariate X, the sum in the normalizing constant nected to like is a more universal concept, and thus the Z is typically computationally intractable, and thus it model has found extensive use in domains far afield is impossible to evaluate the likelihood directly. How- from magnetism. To name a few examples, it is used ever, indirect sampling methods may be used by ex- in computer vision for image denoising [Cohen et al., pressing the likelihood and its derivatives in terms of 2015], in network modeling the classic two-star model expectations and covariances. The log likelihood of can be viewed as an Ising model [Palla et al., 2004, the distribution is Park and Newman, 2004], and in sociology it is use- ful in understanding group choice [Galam, 1997] and X `gibbs(θjx) = θkgk(x) − log(Z(θ)); urban segregation [Stauffer, 2008]. k and the first and second derivatives may then be found 2 Phases In Gibbs Measures to be @`gibbs The Gibbs distribution captures many of the charac- = gi(x) − µi(θ) @θi teristics of typical physical systems, even those which in other domains may lead to computational difficul- and 2 ties and implausible models. In particular, Gibbs mea- @ `gibbs = −cov(gi(X); gj(X)): sures often exhibit the phenomena of a phase transi- @θ @θ i j tion point. This has been extensively studied in the physics literature [Georgii, 2011]. In the context of It is clear from setting the first derivative to zero that social network modeling, multiphase behavior has re- at the maximum likelihood estimate (MLE) of θ, the sulted in significant difficulty in finding good models mean value parameters must match the observed suf- for large complex networks [Handcock, 2003, Schwein- ficient statistics (i.e. g(x) = µ(θ^ )). mle berger, 2011, Chatterjee et al., 2013, Horv´atet al., Since pgibbs is known up to a constant, we may use 2015]. Markov Chain Monte Carlo (MCMC) methods to gen- Consider an Ising model where θ = 0 and θ is very erate a sample from it. Using this sample, the above 1 2 large. If one were to start an MCMC chain at a con- equations may be approximated using weighted sam- figuration where most vertices are set to 1. At equi- ple means and covariances in place of the population librium we would find almost perfect alignment to 1. expectations and covariances. The precise mechanics Similarly, if the starting point for the chain was mostly of using these approximations to find the maximum -1 then the equilibrium state would be a global -1 align- likelihood estimate of θ is known as Markov Chain ment. Monte Carlo Maximum Likelihood Estimation (MCM- CMLE), and has been used successfully across many Such behavior has important implications for prac- Ian E. Fellows, Mark S. Handcock by Horv´atet al. [2015], is a mathematical convenience removing the effects of small local fluctuations and as- suring the existence of a Hessian. While the choice of N* could result in a probability distribution, this is not a hard requirement of its construction, and in many cases, N* could be made to be arbitrarily close to N while still maintaining 2nd order differentiability. Following Horv´atet al. [2015] we define uniphase and multiphase as: Definition 1 A distribution p(g(x)) is uniphase with respect to p∗ if it contains no local minima or saddle points, and is multiphase otherwise. Horv´atet al. [2015] developed an insightful connec- tion between multiphase behavior and the configura- tion count function N.