<<

arXiv:1105.4118v2 [cond-mat.stat-mech] 31 May 2011 fietcl neednl itiue iid)discrete sequence (i.i.d.) a variables distributed Consider random independently Gibbs identical, [12]. as of known probabilists also to is It conditioning a [9–11]. is theorem generally principle now mathematical relative is minimum it the many, that theory, of accepted information work in collective hand, the other following the statistical On for [1]. foundation alternative mechanics proposed an as first Jaynes was E.T. by approach, heart information-based the at these minimum is of or which [8], (MaxEnt), entropy relative entropy The Kullbak-Leibler maximum classic ones. new of the the principle between and mechanics differences statistical to the approach contrasting and ing mechanics. connection Newtonian no to distributions has usually probability which These quantities deduce fluctuating [7]. for to foundation used their are Shan- approaches as claim theory studies information these all non’s Almost to optimization, [6]. networks combinatorial ranging neural to origin, systems processing molecular grow- of signal thermal a from studies no been with mechanical” has processes there “statistical and time, of same the body the to At ing alternatives 5]. [4, which possible thermody- namics statistical as [1–3], of forward foundation inferences Boltzmann-Gibbsian put statistical been and/or have theory tion S rbto Pr tribution † ∗ lcrncades [email protected] address: Electronic lcrncades [email protected] address: Electronic = hspprreaie hs prahswieclarify- while approaches these reexamines paper This informa- on based frameworks, several exist There { ω aiu nrp rnil,EulProbability Equal Principle, Entropy Maximum 1 ω , 2 .,ω ..., , { hoe ocriglredvain frr utain.A fluctuations. princ rare entropy of deviations relative large minimum concerning related theorem and MaxEnt paradox. ehnc rohrasmtost rvd meaningful provide to assumptions other or mechanics ASnumbers: PACS uniform of assumption naive A inference. statistical based classic the follows carefully canoni cal one grand if the obtained to is former method statistics the MaxEnt same in the mean-energy applying However, observing “interpre however, notably approaches, most two ferently; The Darwin-Fowler. to equivalent and mathematically is theory ensemble canonical X ua nvriy hnhi203,PRC. 200433, Shanghai University, Fudan eso htteifraintertcmxmmetoy(Ma entropy maximum information-theoretic the that show We i qa rbblt priori a probability equal avr nvriy abig,M 23,USA. 02138, MA Cambridge, University, Harvard = M X } 1 ω 1 colo ahmtclSine n etefrComputation for Centre and Sciences Mathematical of School ( m X , M } 2 = ol eifiie n ro dis- prior and infinite) be could , · · · p m X , I lsia mechanics, classical (In . n nvriyo ahntn etl,W 89,USA. 98195, WA Seattle, Washington, of University ihtesaespace state the with , n osntne oivk unu ehnc,adteei n is there and mechanics, quantum invoke to need not does One . a Ge Hao Dtd ue1 2011) 1, June (Dated: 1 , 2 ∗ 2 n ogQian Hong and eateto hmsr n hmclBiology, Chemical and Chemistry of Department h bevdsml eni h xetdvleo the of value expected the is then if mean that prior, Note sample observed behavior. unlikely the highly individ- shows of number samples large that ual the given over samples, averaged individual quantity of some collection a of relative tribution to minimum important the is really prior. It is uniform with MaxEnt entropy (2). that in out given point constraints the under otelredvainter feprcldistribution empirical of theory histogram) deviation (i.e., large the to the n sitrrtda cniindo bevn h mean the second observing for the on value “conditioned normalization; as interpreted enforces is one condition first The p p n h eaieentropy relative the ing constants where m ∗ m ∗ prahbsdo otmn’ microcanoni- Boltzmann’s on based approach hstermi ocre ihtecniinldis- conditional the with concerned is theorem This seatytesm sta eie hog minimiz- through derived that as same the exactly is a h cnncl omof form “canonical” the has ω 3 ro distributions prior eateto ple Mathematics, Applied of Department i h lsi praho otmn,Gibbs Boltzmann, of approach classic the ro sntvldi ttsia mechanics. statistical in valid not is prior n cetfi ehd trqie classical requires it method, scientific a s h r aldmcotts)Tetermstates theorem The microstates.) called are lim →∞ ( n nrycnevto ntelatter. the in conservation energy and · ”asm ahmtcltermdif- theorem mathematical same a t” safnto endo h tt space state the on defined function a is ) H h λ peaebsdo h mathematical the on based are iple ( 0 3 λ Pr a nebefis hl h correct the while fails, ensemble cal Priori a m · ( X ” esalcall shall We )”. † M { 0 and =1 p Et praht eiigthe deriving to approach xEnt) ( = L m ∗ p X n |{ || } λ m ∗ ( λ 1 lSsesBiology, Systems al 1 ω 1 1 = = m .Ti hoe scoeyrelated closely is theorem This 0. = r eemndacrigto according determined are p = ) ω m o h expected-value the for , m n ib Paradox Gibbs and } = )

n 1 m X i X =1 M n X =1 i =1 n m X p ∞ h h =1 m ( δ ( X { ω e p the X λ i m m ∗ i = ) 0 = ) + Gibbs o ln p ω prior λ m ∗ m 1  nα h } = ( , p p ω ) m m ∗ h omof form The . m α. )  = · , p p m ∗ m , Two . S and (3) (4) (2) (1) 2

where the delta function δ{Xi=ωm} = 1 if Xi = ωm; and and positions (p, q). Then the high dimensional vector zero otherwise. According to the strong law of large num- X = (X1,X2, ..., XN ) represents a microstate in the mi- ber in probability theory, we know that this empirical crocannonical ensemble. Let the function e(Xi) be the distribution converges to the prior distribution {pm} of energy of the i-th canonical ensemble, and the total en- N Xi. Moreover, the level-2 Sanov theorem in large devia- ergy Etot = i=1 e(Xi). The law of classical mechanical tion theory [12, 14] states that for any set C of probability energy conservation says the X is only confined in the P distribution, we have subspace {Etot = H} where H is the given total energy 1 of the larger . The notion of lim ln Pr{Ln ∈ C} = − inf {I(µ)}, (5) equal a priori probability further assumes that the prob- n →∞ n µ∈C ability of X is equally distributed on such a subspace n [18]. The marginal distribution of each Xi is then expo- where I(µ) = i=1 µ(ωi) ln [µ(ωi)/pi] is the relative en- tropy of µ with respect to the prior distribution pi. nentially dependent on e(Xi) when N tends to infinity. Therefore, theP relative entropy can be interpreted as the Boltzmann’s most probable state method and Darwin- “free energy” of deviation in the sense of a distribution Fowler’s steepest descent method are all based on such a [15]. And at this juncture, the two free energies, one setup and are mathematically equivalent [19]. Note that from the theory of large deviation and one in the theory Boltzmann, Gibbs and Darwin-Fowler deal with the con- of Markov dynamics [16], agree. vergence of empirical distributions as in Eq. (5) rather If we set C = {µ : hh(·)iµ = α} as the space of prob- than marginal distribution as in the theorem (1). How- ability distribution with given constraints, then for any ever, when N tends to infinity, the limiting empirical give distribution µ, distribution is the same as the limiting marginal distri- bution. n 1 The distribution for the high-dimensional microstate Pr Ln = µ h(Xi)= α = Pr{Ln = µ|C} → 0, ( n ) X in Boltzmann’s approach, subjected to the energy con- i=1 X (6) servation Etot = H, is exactly the same as that of the ∗ ∗ ∗ MaxEnt approach conditioned on observing {e = H/N}. unless µ = µ where µ satisfies I(µ ) = infµ∈C {I(µ)}, i.e. with minimum relative entropy. It is implied that Hence they are mathematically equivalent. However, ∗ subtle differences exist in their interpretations: For the the empirical distribution Ln is dominated by µ when n → ∞. Furthermore, since the distributions of different MaxEnt approach, one must first assume the existence 1 n of a prior distribution for the canonical ensemble even Xi under the constraint n i=1 h(Xi)= α are identical, ∗ without a constraint on the mean energy. In the clas- the limiting distribution µ for Ln also holds for each Xi. If one assumes uniform priorP distribution in the canon- sic approach, the equal a priori probability of the entire ical ensemble due to ignorance and constrains based on microcanonical ensemble can be verified from such a uni- the observed mean energy h(·), then the posterior dis- form prior distribution of the independent subsystems without any constraint in the MaxEnt principle. That is tribution h(Xi) is just the exponential, canonical distri- bution. On the other hand, Jaynes [1] argued that the why Jaynes called this framework “subjective thermody- entropy of and the entropy in infor- namics” [1]. mation theory are principally the same thing, and simply However, the reasnoning behind using a uniform prior maximizing the entropy distribution as the most suitable one when one knows nothing of any random variable is only empirical, and ∞ one must be very careful when applying it to a specific S ({p∗ })= p∗ ln p∗ (7) m m m scientific problem. For example, if one only knows the m=1 X mean particle number in a grand-canonical ensemble, this under some constraint on the mean observations would principle would conclude that the particle number distri- give the correct canonical distribution. Hence such an bution is likewise exponential (i.e., geometric). But the optimizing argument is mathematically equivalent to the experimentally observed distribution is Poissonian when previous theorem, and consequently statistical mechan- the particle is nearly independent, whether distinguish- ics could be re-interpreted as a particular application of a able or not (see below for detailed discussion). Hence general theory of logical inference and information theory only justifying the form of the energy distribution in the [17]. While Jaynes’ approach to statistical mechanics, as canonical ensemble is not a sufficient proof for the valid- well as the widely-used minimum relative entropy prin- ity of the MaxEnt principle as substitute for the classical ciple in information theory, is based on observations of statistic mechanics. In other words, the maximum en- mean-energy, the classic approach of Boltzmann, Gibbs tropy or minimum relative entropy principle, by itself, and Darwin-Fowler to statistical mechanics interprets the can never tell you the prior distribution. The prior dis- same theorem differently. tribution has to be supplied by the specific problem to For the canonical ensemble, suppose it is a part of a which the principle is applied. Of course, for the pur- larger microcanonical ensemble consisting of N closed, pose of data analysis exclusively, this technique could be identical canonical ensembles. Let Xi represent the mi- quite useful in supplying a minimal model maximizing crostate of the i-th canonical ensemble, say momenta the degree of freedoms beyond the given constraints [20]. 3

Professional statisticians would also use other methods they also only consider the former one, while the factorial to test the uniform prior hypothesis after the analysis of n! comes from the latter partition of particles [22, 23]. the data. Hence, the number of all high-dimensional microstates Now the central question arises: What are exactly corresponding to {mi} is given by the prior distributions of energy and particle number for the grand-canonical ensemble? Gibbs tried to answer N! (Nhni)! this question more than one hundred years ago, starting W ({mi})= × mi , from the equal probability priori. His derivation for en- i mi! i (ni!) ergy fluctuation was highly successful, but for the grand- subject to the three constrains:P P canonical ensemble with fluctuating particle numbers, a difficulty known as Gibbs paradox arises: Whether or not m = N, the volume φ(E,v,n)= d3nq d3np used in i i grand-canonical ensemble should be divided by n!. It is X now understood that for microcanonicalR or canonical en- Eimi = Etot, i sembles, both with fixed particle number, the paradox is X not a well-defined problem [22]. nimi = Nhni, Similar to the deviation for canonical ensembles, and i X still suppose a large microcanonical ensemble with to- tal energy, volume and particle number invariant. The which could be maximized to derive the correct statistics box further consists of N open, identical small grand- of grand canonical ensembles. canonical ensembles each with fixed volume v and mean For indistinguishable particles, the weight for each particle number hni. They are statistically identical but high-dimensional microstate in the large microcanoni- not rigorously independent. The phase-space uniformity cal ensemble is already different from the distinguishable states that the high-dimensional microstate space con- case and the factorial naturally arises due consideration sists of all the Nhni particles in the large box uniformly of the phase space volume. Hence it is well-known that distributed in position and [18], and ask what although the n! would not appear because of the partition is the distribution of particle numbers within a small of particles into small subsystems, it would emerge from grand-canonical ensemble. Hence the natural method- the phase space volume in this case. Therefore, Gibbs ology is to calculate the number of high-dimensional mi- paradox is definitely not related to quantum mechanics, crostates corresponding to a given energy E and particle and the partition function for grand-canonical ensemble numbers n in a grand-canonical volume. This number should be written as of high-dimensional microstates would give the weight Q(E,v,n) −µn (probability) of such a microstate in the smaller subsys- Ξ= e , (8) n! E,n tem. The relation between a small subsystem and the X rest of the “reservoir” is a rather subtle issue, which has been repeatedly emphasized in statistical physics. where Q(E,v,n) is the partition function for the canon- ical ensemble. Textbooks [21] often proceed in the same manner as in For independent distinguishable particles, one could the treatment of the canonical ensemble through Boltz- understand the n! from another perspective. Due to the mann’s most probable distribution method. This is a phase space uniformity assumption, the position distribu- little misleading. The key to this problem lies on how to tion for each particle is uniformly distributed in this large go about reconstructing the high-dimensional microstate system with total volume Nv. Then at a certain time, from those low-dimensional microstates for each subsys- the probability for each particle belonging to a specific tem. There is no problem for the canonical ensemble, subsystem is 1 . Notice that the total number of particle since one can obtain the high-dimensional microstate N is Nhni, hence the distribution of the particle number in simply from linking all the microstates of each subsys- this subsystem is Binomial with parameter (Nhni, 1 ). tem together. However, in the case of distinguishable N When hni is fixed and N tends to infinity, it converges to particles, we must take into account the partition of a Poisson distribution with mean hni. The factorial just all Nhni particles into the N identical subsystems for comes out from the expression of the Poisson distribution the . Let mi be the number of grand canonical ensembles whose microstates contain- λn p = e−λ, (9) ing ni particles with energy Ei. Hence, for any possible n n! distribution {mi} of the microstates in grand canonical ensembles, there are two kinds of partitions: one is a where λ = hni. This is known as Poisson statistics for a partition of these occupation numbers {mi} into a to- point process, which has been experimentally verified in tal of N subsystems; the other is the partition of all the number fluctuation measurement based on fluorescence Nhni particles (i.e. labeling particles) into the possible correlation spectroscopy (FCS) [24]. Furthermore, when set {ni}. The canonical ensemble only deals with the N tends to infinity, the positions of the particles must former, and in textbooks, for grand canonical ensembles, converge to the well-known Poisson point process and 4 the number of particles within a certain space is just its the above mentioned mathematical theorem, and it as- counting process. serts that all the macroscopic thermodynamic quantities Let us now come back to the maximum entropy or are exactly the sufficient statistics of their microscopic minimum relative entropy principle. It is worth notic- fluctuations. The theory gives the correct distribution ing that the phase space uniformity is of course another when a given ensemble has been perturbed but the new form of maximum entropy for the microcanonical ensem- system still has the conservation law. It implies that all ble without any additional constraint [18] but is different of the distributions in statistical mechanics must belong from Jaynes’ framework. There is even confusionregard- to the exponential family of probability distributions [25]. ing the fact that the derivation of the canonical ensemble In the present study, we clarified E.T. Jaynes’ Max- distribution by Darwin-Fowler is an application of max- Ent approach to the statistical thermodynamic based on imum entropy approach. This is not the case. Althouth information theory, and its relation to classical statisti- they are based on same mathematical theorem, they are cal mechanics. It is found that correctly determining a definitely different interpretations. What Darwin-Fowler prior distribution is the central issue, which could not be did was to derive the distribution of the subsystem from addressed in general from only information theory or sta- the whole phase space uniformity assumption [19]. They tistical inference. Of course, as a mathematical theory, did not mention anything like the uniform prior distri- the theorem of minimum relative entropy could be ap- bution of the subsystem. The most important element plied everywhere and not just be confined to mechanics or in Darwin-Fowler’s interpretation is still the role of con- physics. It justifies the diverse use of “statistical mechan- servation of energy at the level of a whole, isolated sys- ics”, and explains why it works as a fundamental tool in tem, the First Law of Thermodynamics. They actually information theory. More importantly, the mathematical justified a special version of the law of large number in theorem also tells us that the concepts of entropy and rel- the empirical distribution space for canonical ensemble, ative entropy are both mathematical constructions, both and finally got the limiting distribution which was ex- of which naturally arise in the asymptotic probability of actly Boltzmann’s most probable state [19]. We clarify large deviations [14]. a confusion regarding their terminology. The “mean” in It is arguable that information theory, at least in its their work is just the mean occupation number of each mathematical presentation, is a statistical theory en- microstate of the subsystem, which is exactly the proba- dowed with the concept of entropy. This perspective bility rather than the real mean of the fluctuating energy. naturally resolves a nagging issue that troubles “informa- Jaynes’ information approach to is tion” as a more general theoretic concept: The relation a method of statistical inference based on macroscopic between information and knowledge [26]. It is well under- observables, i.e., expected values, in contrast to main stood that thermodynamics is about what is impossible stream statistics whose inferences are often based on (for macroscopic systems) and what is very unlikely. It samples. In both approaches, a prior in the absence of provides constraints on molecular processes, but it can- any measurement can only be subjective. In the present not specify their mechanisms. Knowledge is ultimately paper, we have shown that the Principle of Maximum in the mechanism. There seems to be a contradistinction Entropy can not fully replace the classical Boltzmann- between “statistics” and “knowledge.” Gibbs statistical mechanics precisely because the latter built their “prior” based on (1) uniformity in Newtonian There is another, dynamic origin of the concept of en- mechanical phase space, and (2) conservation of energy, tropy and relative entropy (or free energy). It has been number, etc. These two assumptions are fundamentally shown recently that they are emergent properties of any outside any logical inference approach. The case in point Markovian processes [16]. The original Shannon’s infor- is the grand canonical ensemble: mechanical phase space mation theory for coding, however, is a static one. uniformity necessarily leads to a Poisson distribution as We thank Ken Dill, Jin Feng, Michael Fisher, Chris the prior for the number distribution of independent clas- Jarzynski, Steve Presse, Jin Wang and Ziqing Zhao for sical particles in an infinitesimal open box. stimulating discussions. H. Ge acknowledges support by L. Szilard and B. Mandelbrot also advanced another NSFC 10901040 and specialized Research Fund for the line of interpretation for classical thermodynamics, called Doctoral Program of Higher Education (New Teachers) purely phenomenological theory, based on the theory of 20090071120003. H. Ge thanks Prof. X.S. Xie and mem- sufficient statistics [2, 3]. Interestingly, it is also based on bers of his group for hospitality and support.

[1] E.T. Jaynes, Phys. Rev. 106, 620 (1957); Phys. Rev. Cambridge, MA (1972) pp. 70-102. 108, 171 (1957). [4] L. Boltzmann, Lectures on Gas Theory, Translated by [2] B.B. Mandelbrot, Ann. Math. Stat. 33, 1021 (1962). S.G. Brush (UC Press, Berkeley, 1964). [3] L. Szilard, Zeit. Physik. 32 753 (1972); English trans- [5] J.W. Gibbs, The Scientific Papers of J. Willard Gibbs lation in The Collected Works of Leo Szilard: Scientific (Dover, New York, 1961). Papers, B.T. Feld and G.W. Szilard eds., MIT Press, [6] A.L. Yuille, Neural Comput. 2, 1 (1990); P.D. Simic, 5

Network: Comp. Neur. Sys. 1, 89 (1990). M. Santill´an and H. Qian, Phys. Rev. E. 83, 041130 [7] T.M. Cover and J.A. Thomas, Elements of Information (2011). Theory (John Wiley & Sons, New York, 1991). [17] A. Ben-Naim, A Farewell to Entropy: Statistical Ther- [8] A.I. Khinchin, Mathematical Foundations of Information modynamics based on Information (World Scientific, Sin- Theory (Dover, New York 1957); A. Hobson, J. Stat. gapore, 2008). Phys. 1, 383 (1969). [18] The rigorous definition of equal a priori probability and [9] J.E. Shore and E.W. Johnson, IEEE Tr. Info. Th. IT-26, entropy of a microcanonical ensemble for continuous 26 (1980). cases are riddled with mathematical subtleties. We re- [10] J.M. van Campenhout and T.M. Cover, IEEE Tr. Info. fer the reader to Khinchin’s book [13] for details. Th. IT-27, 483 (1981). [19] P.L. Ponczek and C.C. Yan, Revista Brasileira de Fisica [11] J.R. Banavar, A. Maritan and I. Volkov, J. Phys.: Con- 6, 471 (1976). dens. Matter 22, 063101 (2010) [20] E. Schneidman, M.J. Berry, R. Segev and W. Bialek, [12] Gibbs conditioning in the theory of probability started Nature, 440, 1007 (2006). with Khinchin’s [13] and O. Lanford’s work, which had [21] McQuarrie, D.A.: Statistical mechanics. (Viva Books been taken up by Stroock and Zeitouni, and became a Private Limited, New Delhi, 2005) part of modern theories of large deviation. For a brief his- [22] J.R. Ray, Eur. J. Phys. 5, 219 (1984); R., Baker, Theory torical account, see A. Dembo and O. Zeitouni, Large De- of Heat, 2nd ed. (Springer, New York, 1967). viation Techniques and Applications, 2nd ed. (Springer- [23] The history and different approaches to resolving Gibbs Verlag: New York, 1998); J. Feng, Lecture Notes (un- paradox is controversial. A more detailed account of the published) (2011). subject will be published elsewhere. [13] A.I. Khinchin, Mathematical Foundations of Statistical [24] D. Magde, E.L. Elson and W.W. Webb, Phys. Rev. Lett. Mechanics (Dover, New York, 1960). 29, 705 (1972); H. Qian, Biophys. Chem. 38, 49 (1990). [14] H. Touchette, Phys. Rep. 478, 1-69 (2009). [25] H. Jeffreys, Proc. Cambridge Philos. Soc. 54, 393 (1960). [15] H. Qian, Phys. Rev. E. 63, 042103 (2001); P.G. [26] J. Gleick Information: A History. A Theory. A Flood Bergmann and J.L. Lebowitz, Phys. Rev. 99, 578 (1955). (Pantheon Books, New York, 2011); G. Nunberg, NYT [16] H. Ge, Phys. Rev. E. 80, 021137 (2009); H. Ge and H. Rev. Book March 20, 10 (2011) Qian, Phys. Rev. E. 81, 051133 (2010); M. Esposito and C. van den Broeck, Phys. Rev. Lett. 104, 090601 (2010);