Ising Distribution As a Latent Variable Model Adrien Wohrer
Total Page:16
File Type:pdf, Size:1020Kb
Ising distribution as a latent variable model Adrien Wohrer To cite this version: Adrien Wohrer. Ising distribution as a latent variable model. Physical Review E , American Physical Society (APS), 2019, 99 (4), pp.042147. 10.1103/PhysRevE.99.042147. hal-02170202 HAL Id: hal-02170202 https://hal.uca.fr/hal-02170202 Submitted on 1 Jul 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Ising distribution as a latent variable model Adrien Wohrer∗ Universit´eClermont Auvergne, CNRS, SIGMA Clermont, Institut Pascal, F-63000 Clermont-Ferrand, France. During the past decades, the Ising distribution has attracted interest in many applied disciplines, as the maximum entropy distribution associated to any set of correlated binary (`spin') variables with observed means and covariances. However, numerically speaking, the Ising distribution is un- practical, so alternative models are often preferred to handle correlated binary data. One popular alternative, especially in life sciences, is the Cox distribution (or the closely related dichotomized Gaussian distribution and log-normal Cox point process), where the spins are generated indepen- dently conditioned on the drawing of a latent variable with a multivariate normal distribution. This article explores the conditions for a principled replacement of the Ising distribution by a Cox dis- tribution. It shows that the Ising distribution itself can be treated as a latent variable model, and it explores when this latent variable has a quasi-normal distribution. A variational approach to this question reveals a formal link with classic mean field methods, especially Opper and Winther's adaptive TAP approximation. This link is confirmed by weak coupling (Plefka) expansions of the different approximations, and then by numerical tests. Overall, this study suggests that an Ising distribution can be replaced by a Cox distribution in practical applications, precisely when its pa- rameters lie in the `mean field domain'. I. INTRODUCTION The essential interest of the Ising distribution, in all dis- ciplines mentioned above, is its maximum entropy prop- During the last decades, the Ising distribution has erty : whenever a dataset of N binary variables has mea- been used in several disciplines such as statistics (under sured moments (m; C), a single distribution of the form the name quadratic exponential model) [1, 2], machine (1) is guaranteed to exist which matches these moments, learning (under the name Boltzmann machine) [3], infor- and furthermore it has maximal entropy under this con- mation processing [4{6], biology [7] and neurosciences, straint. where it has been proposed as a natural model for the Unfortunately, the Ising distribution is numerically un- spike-based activities of interconnected neural popula- wieldy. The simple act of drawing samples from the tions [8, 9]. In most of these applications, classic as- distribution already requires to set up lengthy Markov sumptions from statistical physics do not hold (e.g., ar- Chain Monte Carlo (MCMC) schemes. Besides, there rangement on a rectangular lattice, uniform couplings, is no simple analytical link between parameters (h; J) independently distributed couplings, zero external fields, and resulting moments (m; C). Given natural parame- etc.), and even old problems have to be revisited, such ters (h; J), the direct Ising problem of estimating (m; C) as efficiently simulating the Ising distribution [10, 11] or can only be solved by numerical sampling from MCMC inferring its parameters from data [12{15]. chains. Given (m; C), the inverse Ising problem of re- In this article I will consider the Ising probability dis- trieving (h; J) can only be solved by gradient descent N based on numerous iterations of the direct problem, a tribution over a set of spins s = (s1; : : : ; sN ) 2 {−1; 1g defined as procedure known as Boltzmann learning. In practice, this means that the Ising distribution cannot be parametrized N N easily from observed data. 1 X 1 X P (sjh; J) = exp hisi + Jijsisj ; (1) For this reason, in spite of the Ising model's theoretical Z 2 I i=1 i;j=1 attractiveness when dealing with binary variables, alter- native models are generally preferred, which are numeri- with parameters h 2 RN (external fields) and J an N ×N cally more convenient. In one such family of alternative N symmetric matrix (coupling weights), ZI (h; J) being the models, N latent variables r = (r1; : : : ; rN ) 2 R are corresponding partition function. In this formulation, drawn from a multivariate normal distribution diagonal elements Jii can be nonzero without influencing P −1=2 1 > −1 the distribution, simply adding a constant term Jii=2 N (rjµ; Σ) = j2πΣj exp − (r − µ) Σ (r − µ) i 2 to both the exponent and log(ZI ). I will note (m; C) the two first centered moments of and then used to generate N spins independently. The the distribution, that is, for all indices i; j, simplest option, setting si = sign(ri) deterministically, yields the dichotomized Gaussian distribution [16, 17], mi = E(si) ;Cij = E(sisj) − E(si)E(sj): which has enjoyed recent popularity as a replacement for the Ising distribution when modeling neural spike trains, as it is easy to sample and to parametrize from data [18, 19]. ∗ [email protected] Slightly more generally, each variable ri can serve as Accepted for publication in Physical Review E an intensity to draw the corresponding spin si following 2 a Bernoulli distribution : similarities with classic mean-field methods which aim at approximating the Ising moments (m; C) (Section IV). N 1 X Numerical simulations reveal that both types of approx- B(sjr) = exp risi ZB imations, despite their seemingly different goals, are effi- i=1 cient in roughly the same domain of parameters (Section with partition function V). Thus, an Ising distribution can be replaced in prac- tical applications by a Cox distribution, precisely if its Y ri −ri ZB(r) = e + e : parameters lie in the `mean field domain'. i In statistics, this is the model underlying logistic regres- II. THE ISING LATENT FIELD sion, as introduced by Cox [20], so I will refer to it as the Cox distribution : N Given any vector h 2 R and N × N symmetric, def- inite positive matrix J, we will consider the following Q(rjµ; Σ) = N (rjµ; Σ); (2) probability distribution over r 2 RN and s 2 {−1; 1gN : Q(sjr) = B(sjr); (3) 1 1 > −1 > N P (s; r) = exp − (r − h) J (r − h) + r s ; (4) with µ any vector in R and Σ any N × N symmet- Z 2 ric definite positive matrix. Note that the dichotomized Gaussian corresponds to a limiting case of the Cox dis- with Z ensuring proper normalization. Marginalizing out variable s yields tribution, when the scaling of variables ri tends to +1 1. Z (r) 1 Both the Ising (eq. (1)) and Cox (eq. (2)-(3)) distri- P (r) = B exp − (r − h)>J−1(r − h) ; (5) butions can be generalized to point processes, by taking Z 2 a suitable limit when the N indexed variables tend to P (sjr) = B(sjr): (6) a continuum [21]. These are respectively known as the Gibbs process [21] and log Gaussian Cox process [22, 23]. Conversely, completing the square in eq. (4) and The latter, much simpler to handle in practice, is used marginalizing out variable r yields in various applied fields such as epidemiology, geostatis- j2πJj1=2 1 tics [24] and neurosciences, to model neural spike trains P (s) = exp h>s + s>Js ; (7) [25, 26]. Z 2 To summarize, in practical application, Cox models P (rjs) = N (rjh + Js; J): (8) (including the dichotomized Gaussian, and Cox point From eq. (7), the resulting spins s are distributed accord- processes) are often preferred to the corresponding max- ing to the Ising distribution of parameters (h; J). imum entropy distributions (Ising distribution, Gibbs The introduction of field variables r has long been point process) because they are easier to sample, and i known in statistical physics, as a mathematical construct to parametrize from a set of observed data. However, to express the Ising partition function Z in an integral to date, we have little analytical insights into the link I form [5]. Indeed, equating the respective expressions for between the two families of distributions. For example, Z imposed by eq. (5) and (7), we obtain the elegant for- given some dataset, we cannot tell in advance how simi- mula lar the Cox and Ising models fitting this data would be. The goal of this article is to investigate this link. Z I first show that the Ising distribution itself can be ZI (h; J) = ZB(r)N (rjh; J)dr (9) r2 N viewed as a latent variable model, which differs from a R Cox distribution only because of the non-Gaussian distri- expressing ZI as the convolution of ZB with a Gaussian bution of its latent variable (Section II). This allows to kernel of covariance J. This formula can be used as a derive simple relations between an Ising distribution and justification of classic mean field equations [5], and more its `best-fitting' Cox distributions, in two possible senses generally to derive the diagrammatic (i.e., Taylor) expan- (Section III). In particular, the variational approach for sion of ZI as a function of J [27].