A Maximum Entropy Framework for Nonexponential Distributions
Total Page:16
File Type:pdf, Size:1020Kb
A maximum entropy framework for nonexponential distributions Jack Petersona,b, Purushottam D. Dixitc, and Ken A. Dillb,1 aDepartment of Mathematics, Oregon State University, Corvallis, OR 97331; bLaufer Center for Physical and Quantitative Biology, Departments of Physics and Chemistry, State University of New York, Stony Brook, NY 11794; and cDepartment of Systems Biology, Columbia University, New York, NY 10032 Contributed by Ken A. Dill, November 7, 2013 (sent for review June 26, 2013) P Probability distributions having power-law tails are observed in functional S½fpkg = − kpklog pk subject to constraints, such as a broad range of social, economic, and biological systems. We the known value of the average energy hEi. This procedure gives −βE describe here a potentially useful common framework. We derive the exponential (Boltzmann) distribution, pk ∝ e k ,whereβ distribution functions {pk} for situations in which a “joiner particle” is the Lagrange multiplier that enforces the constraint. This var- k pays some form of price to enter a community of size k − 1, where iational principle has been the subject of various historical justifi- costs are subject to economies of scale. Maximizing the Boltzmann– cations. It is now commonly understood as the approach that Gibbs–Shannon entropy subject to this energy-like constraint chooses the least-biased model that is consistent with the known predicts a distribution having a power-law tail; it reduces to the constraint(s) (39). Boltzmann distribution in the absence of economies of scale. We Is there an equally compelling principle that would select fat- show that the predicted function gives excellent fits to 13 different tailed distributions, given limited information? There is a large distribution functions, ranging from friendship links in social net- literature that explores this. Inferring nonexponential distribu- works, to protein–protein interactions, to the severity of terrorist tions can be done by maximizing a different mathematical form attacks. This approach may give useful insights into when to expect of entropy, rather than the BGS form. Examples of these non- power-law distributions in the natural and social sciences. traditional entropies include those of Tsallis (40), Renyi (41), and othersP (42, 43). For example, the Tsallis entropy is defined heavy tail | fat tail | statistical mechanics | thermostatistics | social physics K pq − K q as 1 − q ð k k 1Þ, where is a constant and is a parameter for the problem at hand. Such methods otherwise follow the same robability distributions are often observed to have power-law strategy as above: maximizing the chosen form of entropy subject Ptails, particularly in social, economic, and biological systems. fl fi to an extensive energy constraint gives nonexponential distribu- Examples include distributions of uctuations in nancial mar- tions. The Tsallis entropy has been applied widely (44–53). kets (1), the populations of cities (2), the distribution of Web site However, we adopt an alternative way to infer nonexponential links (3), and others (4, 5). Such distributions have generated much distributions. To contrast our approach, we first switch from proba- popular interest (6, 7) because of their association with rare but bilities to their logarithms. Logarithms of probabilities can be consequential events, such as stock market bubbles and crashes. parsed into energy-like and entropy-like components, as is stan- If sufficient data are available, finding the mathematical shape of fi dard in statistical physics. Said differently, a nonexponential dis- a distribution function can be as simple as curve- tting, with a fol- tribution that is derived from a Max Ent principle requires that low-up determination of the significance of the mathematical form fi there be nonextensivity in either an energy-like or entropy-like used to t it. However, it is often interesting to know if the shape of term; that is, it is nonadditive over independent subsystems, not a given distribution function can be explained by an underlying scaling linearly with system size. Tsallis and others have chosen to generative principle. Principles underlying power-law distributions assign the nonextensivity to an entropy term, and retain exten- have been sought in various types of models. For example, the sivity in an energy term. Here, instead, we keep the canonical power-law distributions of node connectivities in social networks BGS form of entropy, and invoke a nonextensive energy-like term. have been derived from dynamical network evolution models (8– 17). A large and popular class of such models is based on the fi preferential attachment rule (18–27), wherein it is assumed that Signi cance new nodes attach preferentially to the largest of the existing nodes. Explanations for power laws are also given by Ising models in Many statistical distributions, particularly among social and bi- “ ” critical phenomena (28–34), network models with thresholded ological systems, have heavy tails, which are situations where “fitness” values (35), and random-energy models of hydrophobic rare events are not as improbable as would have been guessed contacts in protein interaction networks (36). from more traditional statistics. Heavy-tailed distributions are “ ” However, such approaches are often based on particular the basis for the phrase the rich get richer. Here, we propose mechanisms or processes; they often predict particular power-law a basic principle underlying systems with heavy-tailed dis- exponents, for example. Our interest here is in finding a broader tributions. We show that it is the same principle (maximum vantage point, as well as a common language, for describing a entropy) used in statistical physics and statistics to estimate range of distributions, from power law to exponential. For de- probabilistic models from relatively few constraints. The heavy- riving exponential distributions, a well-known general principle is tail principle can be expressed in terms of shared costs and the method of maximum entropy (Max Ent) in statistical physics economies of scale. The probability distribution we derive is (37, 38). In such problems, you want to choose the best possible a mathematical digamma function, and we show that it accu- rately fits 13 real-world data sets. distribution from all candidate distributions that are consistent with certain set of constrained moments, such as the average en- Author contributions: J.P. and K.A.D. designed research; J.P. and P.D.D. performed research; ergy. For this type of problem, which is highly underdetermined, J.P. and K.A.D. contributed new reagents/analytic tools; J.P. analyzed data; and J.P., P.D.D., a principle is needed for selecting a “best” mathematical function and K.A.D. wrote the paper. from among alternative model distribution functions. To find the The authors declare no conflict of interest. mathematical form of the distribution function pk over states 1To whom correspondence should be addressed. E-mail: [email protected]. k = ; ; ; ... 1 2 3 , the Max Ent principle asserts that you should This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. maximize the Boltzmann–Gibbs–Shannon (BGS) entropy 1073/pnas.1320578110/-/DCSupplemental. 20380–20385 | PNAS | December 17, 2013 | vol. 110 | no. 51 www.pnas.org/cgi/doi/10.1073/pnas.1320578110 Downloaded by guest on September 25, 2021 − μ Fig. 1. The joining cost for a particle to join a size k 1 community is k . This diagram can describe particles forming colloidal clusters, or social processes such as people joining cities, citations added to papers, or link creation in a social network. In our view, only the latter approach is consistent with the principles Communal Assemblies and Economies of Scale. Now, we develop a elucidated by Shore and Johnson (37) (reviewed in ref. 39). Shore general model of communal assembly based on economies of scale. and Johnson (37) showed that the BGS form of entropy is uniquely Consider a situation where the joining cost for a particle depends the mathematical function that ensures satisfaction of the addition on the size of the community it joins. In particular, consider and multiplication rules of probability. Shore and Johnson (37) as- situations in which the costs are lower for joining a larger sert that any form of entropy other than BGS will impart a bias that community. Said differently, the cost-minus-benefit function is unwarranted by the data it aims to fit. We regard the Shore and μk is now allowed to be subject to economies of scale, which, as Johnson (37) argument as a compelling first-principles basis for we note below, can also be interpreted instead as a form of defining a proper variational principle for modeling distribution discount in which the community pays down some of the joining functions. Here, we describe a variational approach based on the costs for the joiner particle. BGS entropy function, and we seek an explanation for power-law To see the idea of economy-of-scale cost function, imagine distributions in the form of an energy-like function instead. building a network of telephones. In this case, a community of size 1 is a single unconnected phone. A community of size 2 is Theory two connected phones, etc. Consider the first phone: The cost of Assembly of Simple Colloidal Particles. We frame our discussion in creating the first phone is high because it requires initial in- terms of a joiner particle that enters a cluster or community of vestment in the phone assembly plant. And the benefit is low, particles, as shown in Fig. 1. However, this is a natural way to because there is no value in having a single phone. Now, for the APPLIED describe the classical problem of the colloidal clustering of second phone, the cost-minus-benefit is lower. The cost of pro- MATHEMATICS physical particles; it is readily shown (reviewed below) to give an ducing the second phone is lower than the first because the exponential distribution of cluster sizes.