Arxiv:0712.2437V2 [Q-Bio.QM] 15 Dec 2007 Fully) Simpler Set of Interactions

Faster solutions of the inverse pairwise Ising problem Tamara Broderick,a Miroslav Dud´ık,b Gaˇsper Tkaˇcik,c Robert E. Schapireb and William Bialekc;d aDepartment of Mathematics, bDepartment of Computer Science, cJoseph Henry Laboratories of Physics, cLewis{Sigler Institute for Integrative Genomics, and dPrinceton Center for Theoretical Physics, Princeton University, Princeton, NJ 08544 (Dated: December 15, 2007) Recent work has shown that probabilistic models based on pairwise interactions|in the simplest case, the Ising model|provide surprisingly accurate descriptions of experiments on real biological networks ranging from neurons to genes. Finding these models requires us to solve an inverse problem: given experimentally measured expectation values, what are the parameters of the underlying Hamiltonian? This problem sits at the intersection of statistical physics and machine learning, and we suggest that more efficient solutions are possible by merging ideas from the two fields. We use a combination of recent coordinate descent algorithms with an adaptation of the histogram Monte Carlo method, and implement these techniques to take advantage of the sparseness found in data on real neurons. The resulting algorithm learns the parameters of an Ising model describing a network of forty neurons within a few minutes. This opens the possibility of analyzing much larger data sets now emerging, and thus testing hypotheses about the collective behaviors of these networks. I. INTRODUCTION from statistical mechanics could provide a model for the emergence of function in biological systems, and this gen- In the standard problems of statistical mechanics, we eral class of ideas has been explored most fully for net- begin with the definition of the Hamiltonian and pro- works of neurons [6, 7]. Recent work shows how these ceed to calculate the expectation values or correlation ideas can be linked much more directly to experiment functions of various observable quantities. In the in- [9, 10] by searching for maximum entropy models that verse problem, we are given the expectation values and capture some limited set of measured correlations. At a try to infer the underlying Hamiltonian. The history of practical level, implementing this program requires us to the inverse problems goes back (at least) to 1959, when solve a class of inverse problems for Ising models with Keller and Zumino [1] showed that, for a classical gas, the pairwise interactions among the spins, and this is the temperature dependence of the second virial coefficient problem that we consider here. determines the interaction potential between molecules To be concrete, we consider a network of neurons. uniquely, provided that this potential is monotonic. Sub- Throughout the brain, neurons communicate by gener- sequent work on classical gases and fluids considered the ating discrete, identical electrical pulses termed action connection between pair correlation functions and inter- potentials or spikes. If we look in a small window of action potentials in various approximations [2], and more time, each neuron either generates a spike or it does not, rigorous constructions of Boltzmann distributions consis- so that there is a natural description of the instantaneous tent with given spatial variations in density [3] or higher state of the network by a collection of binary or Ising vari- order correlation functions [4]. ables; σi = +1 indicates that neuron i generates a spike, and σ = −1 indicates that neuron i is silent. Knowing In fact the inverse problem of statistical mechanics i the average rate at which spikes are generated by each arises in many different contexts, with several largely in- cell is equivalent to knowing the expectation values hσ i dependent literatures. In computer science, there are a i for all i. Similarly, knowing the probabilities of coincident number of problems where we try to learn the probabil- spiking (correlations) among all pairs of neurons is equiv- ity distribution that describes the observed correlations alent to knowing the expectation values hσ σ i. Of course among a large set of variables in terms of some (hope- i j there are an infinite number of probability distributions arXiv:0712.2437v2 [q-bio.QM] 15 Dec 2007 fully) simpler set of interactions. Many algorithms for P (σ) over the states of the whole system (σ ≡ fσ g) solving these learning problems rest on simplifications i that are consistent with these expectation values, but if or approximations that correspond quite closely to es- we ask for the distribution that is as random as possible tablished approximation methods in statistical mechan- while still reproducing the data|the maximum entropy ics [5]. More explicitly, in the context of neural network distribution|then this has the form of an Ising model models [6, 7], a family of models referred to as `Boltz- with pairwise interactions: mann machines' lean directly on the mapping of probabilistic models into statistical physics problems, identi- 2 3 1 X 1 X fying the parameters of the probabilistic model with the P (σ) = exp − h σ − J σ σ : (1) Z 4 i i 2 ij i j5 coupling constants in an Ising{like Hamiltonian [8]. i i6=j Inverse problems in statistical mechanics have received new attention because of attempts to construct explicit The inverse problem is to find the \magnetic fields” fhig network models of biological systems. Physicists have and \exchange interactions" fJijg that reproduce the ob- long hoped that the collective behavior which emerges served values of hσii and hσiσji. 2 The surprising result of Ref [9] was that this Ising ments, σ ≡ fσ1; σ2; ··· ; σN g. As ingredients for deter- model provides an accurate quantitative description of mining this model, we use low order statistics computed the combinatorial patterns of spiking and silence ob- from a set of m samples fσ1; σ2; ··· ; σmg, which we can served in groups of order N = 10 neurons in the retina as think of as samples drawn from the distribution P (σ). it responds to natural sensory inputs, despite only taking The classical idea of maximum entropy models is that account of pairwise interactions. The Ising model allows we should construct P (σ) to generate the correct values us to understand how, in this system, weak correlations of certain average quantities (e.g., the energy in the case among pairs of neurons can coexist with strong collective of the Boltzmann distribution), but otherwise the distri- effects at the network level, and this is even clearer as one bution should be às random' as possible [21]. Formally extends the analysis to larger groups (using real data for this means that we find P (σ) as the solution of a con- N = 40, and extrapolating to N = 120), where there strained optimization problem, maximizing the entropy is a hint that the system is poised near a critical point of the distribution subject to conditions that enforce the in its dynamics [10]. Since the initial results, a number correct expectation values. We will refer to the quanti- of groups have found that the maximum entropy models ties whose averages are constrained as \features" of the provide surprisingly accurate descriptions of other neu- system, f ≡ ff1; f2; ··· ; fK g, where each fµ is a function ral systems [11, 12, 13] and similar approaches have been of the state σ, fµ(σ). used to look at biochemical and genetic networks [14, 15]. One special set of average features are just the The promise of the maximum entropy approach to bio- marginal distributions for subsets of the variables. Thus logical networks is that it builds a bridge from easily ob- we can construct the one{body marginals servable correlations among pairs of elements to a global X view of the collective behavior that can emerge from the Pi(σi) = P (σ1; σ2; ··· ; σN ); (2) network as a whole. Clearly this potential is greatest fσj6=ig in the context of large networks. Indeed, even for the retina, methods are emerging that make it possible to the two{body marginals, record simultaneously from hundreds of neurons [16, 17], X so just keeping up with the data will require methods Pij(σi; σj) = P (σ1; σ2; ··· ; σN ); (3) to deal with much larger instances of the inverse prob- fσk6=i;jg lem. The essential difficulty, of course, is that once we and so on for larger subsets. The maximum entropy have a large network, even checking that a given set of distributions consistent with marginal distributions up parameters fh ;J g reproduce the observed expectation i ij to K{body terms generates a hierarchy of models that values requires a difficult calculation. In Ref [10] we took capture increasingly higher{order correlations, monoton- an essentially brute force Monte Carlo approach to this ically reducing the entropy of the model as K increases, part of the problem, and then adjusted the parameters toward the true value [22]. to improve the match between observed and predicted Let P~ denote the empirical distribution expectation values using a relatively naive algorithm. In this work we combine several ideas|taken both m 1 X from statistical physics and from machine learning [18]| P~(σ) = δ(σ; σn); (4) m which seem likely to help arrive at more efficient solutions n=1 of the inverse problem for the pairwise Ising model. First, where δ(σ; σ0) is the Kronecker delta, equal to one when we adapt the histogram Monte Carlo method [19] to `re- 0 cycle' the Monte Carlo samples that we generate as we σ = σ and equal to zero otherwise. The maximum- make small changes in the parameters of the Hamilto- entropy problem is then nian. Second, we use a coordinate descent method to maxP S[P ] such that hf(σ)iP = hf(σ)i ~ ; (5) adjust the parameters [20]. Finally, we exploit the fact P that neurons use their binary states in a very asymmetric where S[P ] denotes the entropy of the distribution P , and fashion, so that silence is much more common that spik- h· · ·iP denotes an expectation value with respect to that ing.

Arxiv:0712.2437V2 [Q-Bio.QM] 15 Dec 2007 Fully) Simpler Set of Interactions

Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

Covariances, Robustness, and Variational Bayes

2018 Annual Report Alfred P

Martin Hairer: Fields Medalist

Parallel Streaming Wasserstein Barycenters

Quantifying Uncertainty and Robustness at Scale Tamara Broderick ITT Career Development Assistant Professor [email protected]

Ryan Giordano Runjing Liu Nelle Varoquaux Michael I. Jordan

Tamara Broderick Associate Professor Electrical Engineering & Computer Science MIT

Lester Mackey

Tamara Broderick Michael I. Jordan

Tamara Broderick ITT Career Development Assistant Professor, MIT

January 2007 Prizes and Awards