arXiv:1902.09536v1 [cond-mat.stat-mech] 25 Feb 2019 ihrsett h eegemaue hsesrsta all that ensures This measure. Lebesgue the to respect with hs agnldsrbtosare distributions marginal whose hl the while hnei the in change A fti ae,w ilasm htaltemaue under measures on measures the ea all all to that respect that and with other assume continuous absolutely will are we consideration paper, this of betta sdfndo h niil rbblt pc.I space. var probability expectation, statistics. with invisible and variable, random the a on itself is defined actually mathematic is a is that this object Note, well-defined. are derivatives RN unu ehnc 2,oede o e rtuhteobjects space, the probability touch or the see not in does one [2], mechanics quantum a sm”tp,then type, “same” a measures hnei ope ol?T ersn complex, a represent space To abstract an b envisions developed Kolmogorov world? probability N. of theory complex the [1], a world stochastic in change a change; the of representation bevstepoaiiysaetruhfntos say functions, through space probability the observes cec,oesml ok ihtetofunctions two the with works simply one science, aldrno aibe hc map which variables random called F rbblt itiuinfnto (cdf) function distribution probability ucinmp h rbblt measure probability the maps function F representation F h hne n elzsta hr r w osbesources possible two are there that realizes one change, the n a hrceieti cag fmaue ntrsof terms in measure” derivative of (RN) Radon-Nikodym “change a this characterize can one ∗ ‡ † u u u ˜ ˜ rbblt space probability [email protected] [email protected] [email protected] hneacrigt lsia hsc ssml:i one if simple: is physics classical to according change A o hneocr;adbsdo bevto()the observation(s) on based and occurs; change a Now f oee,oeepoe itemr nte“rgn of “origin” the on more little a explores one however, If, ( ( ( x x x enr ersnaino tcatcCag n h Origin the and Change Stochastic of Representation Ternary ) ) 2 nfc,temr opeedsrpini the in description complete more the fact, In . = ) scagdto changed is u x F ( 1 ω u u ˜ representation n oeta fma oc fKrwo.Mta information Mutual Kirkwood. of force mean of potential and xetto a engvnb hno.Ifrainlrepre Informational Shannon. by given R been has expectation x ae nrno variable random on based R oeulbimtemdnmcetoyblneeuto is equation balance thermodynamic nonequilibrium ) and ( sajitpoaiiydistribution probability joint a is n n sfxd hnacrigt h esr theory, measure the to according then fixed, is ∞ hnei tcatcsse a he ersnain:P representations: three has system stochastic a in change A = n ii hnei h rbblt measure probability the in change a (iii) and ; P eel nrpcadeegtctrs n h oin fcon of notions the and terms, energetic and entropic reveals x , u racag nthe in change a or , x iia oteHletsaeunderlying space Hilbert the to Similar . .INTRODUCTION I. ( 2 2 ω eateto ple ahmtc,Uiest fWashingt of University Mathematics, Applied of Department ) ∆ F ) hc r ris nra ubr,of numbers, real in traits, are which . usiuigarno aibeit t w est functi density own its into variable random a Substituting . u ˜ x ( x ω hnei uniidb h ao-ioy derivative Radon-Nikodym the by quantified is change a = ) ntecretsaitcldata statistical current the In . ∈ ∆ x R x 2 Ω n ogQian, Hong F ∈ − o the nor , u r boueycontinuous absolutely are d d R ( x x P P ˜ u F Ω n 1 1 ( ( o ocharacterize to How . u ω u ω = ) stemathematical the is ( ( → ) ) x P ω → ) 35.I h rest the In [3–5]. ) , F oacumulative a to fthe If . (Ω P R ∗ x u u ˜ ahr one Rather, . n u uCe Cheng, Yu-Chen ∈ ( ˜ , ω h same The . F ( F x R ) F , u hsidcs(i h rbblt distributions probability the (ii) induces this ; 1 statistical n P u u ˜ , . ( P ∞ ) ( x x called iance, 1 u ) ) → x , ( A. y and and P ω ch 2 al P ˜ ) → t ) , : P ˜ † ne h aeobservable same the under ok ilapa hn[5]. then appear will work” hc sol eie on defined only is which admvariable random n ban h h f( of lhs the obtains one itrsi unu yais[2:i em fwave of terms te in in observables. and as space operators Hilbert self-adjoint [12]: invisible of abstract, Heisenberg dynamics the in and quantum functions Schr¨odinger in the echo pictures changes stochastic o-dnia ngnrl h oino “thermodynamic of notion The general. in non-identical osdrasqec fra-audrno aibe fa of variables random real-valued of sequence a Consider u pntesohsi hnewt aro correlated of pair a with The tou change only stochastic theory. we the cases; single simple upon the very under on restricted unified are are discussions sho entropy We of types change. all stochastic of representation informational nrp 1] h euti ( in result The rela [11]. related become entropy intimately then or entropy value notion information expected Shannon the Its a the and create [8–10]. fluctuating 7], self-information of to [6, of idea function fundamental stochastic density in the own entropy is variable its random of new logarithm the into representation For h eut r eeal ai o multidimensional for valid generally are results the nti case, this In nterso ( of rhs the On nters fteppr esalcnie the consider shall we paper, the of rest the In ˜ n oelF Thompson F. Lowell and ( identified. ω h nomtoa n rbblsi ersnain of representations probabilistic and informational The ttsisadifrain uhfradadpull-back. and Push-forward information: and Statistics hti h eainhpbtenthis between relationship the is What nti ae,w rsn e eut ae nthis on based results key present we paper, this In ) u oaiitc ttsia,adifrainl i is (i) informational: and statistical, robabilistic, etto fadtriitctasomto on transformation deterministic a of sentation hc aersetv generated respective have which , ( iuainletoyo otmn n Gibbs, and Boltzmann of entropy figurational I NOMTOA ERSNAINOF REPRESENTATION INFORMATIONAL II. ω rssfrcorrelated for arises ) , u ˜ n etl,W 89,U.S.A. 98195, WA Seattle, on, ( ncetsafutaigetoywhose entropy fluctuating a creates on ω 1 ) d d ln ftecag n h observed the and change the of ), F F ∈ d d fEtoyadIsFluctuations Its and Entropy of u u ˜ P P u ˜ d d d d TCATCCHANGE STOCHASTIC R F F P ˜ P ( ( x ω h nwri [52]: is answer the , u u ˜ ( ω ω 1 ( ) = ) , x )  notepoaiiydniyfunction, density probability the into u · · · ) 1 ( slk rbblt est function, density probability a like is = .Ptigarno aibeback variable random a Putting ). ω  u ‡ F 1 x , ) d d ( nthe In . − a egnrlzdto generalized be can ) u R ω F F ( n ) ln oee,sbtttn the substituting However, . x u u ˜  ) and  = → u d d F F informational ( u ω ˜ u u ˜ F ∂x ∂x ( ( ) u ω ˜ x ∂ ∂  1 1 ( n n ) ) ··· ···  x n a and ;  F F σ − ) ∂x ∂x u u ˜ , when agba htare that -algebras 1 x n n . u ∈ . ( informational ω F ) u u, u ∈ and u ( u ˜ ω R ( ω ) ∈ But . . ) F rms tive R (1) (2) → ch u ˜ w n ? s . 2 same physical origin and their individual cdfs: This is exactly the relative entropy w.r.t. the stationary f ∗(x). In statistical thermodynamics, this is called free energy. F (x), F (x), ··· , F T (x). (3) −1 ∗ u1 u2 u −β ln f (x) on the rhs of (8) is called internal energy, According to the axiomatic theory of probability built on where β stands for the physical unit of energy. The integral (Ω, F, P), there is a sequence of random variables is the mean internal energy. Essential facts on information. Several key mathematical u1(ω),u2(ω), ··· ,uT (ω), (4) facts concerningthe information,as a random variable defined in (1), are worth stating. in which each maps the P to the push-forward uk(ω) (ω) First, even though the u(ω) appears in the rhs of the measure , R. Eq. 5 illustrate this with and Fk(x) x ∈ u u˜ equation, the resulting lhs is independent of the u(ω): Itisa stand for any u and u . i j random variable created from P, P˜, and random variable u˜(ω), as clearly shown in Eq. 5. This should be compared with a P u(ω)✲ (ω) Fu(x) well-known result in elementary probability: For a real-valued ◗ ◗ (5) random variable η(ω) and its cdf Fη(x), the constructed −1 u˜(ω) dP˜ dFu ◗ random variable Fη η(ω) is a uniform distribution on [0, 1] dP (ω)= dFu u(ω) ◗ ˜ ◗ independent of the nature of η(ω).   ❄ ◗s Second, if we denote the logarithm of (1) as ξ(ω), ξ(ω) = u(ω)✲ P˜ P˜(ω) Fu˜(x) d ln dP , then one has a result based on the change of measure for integration: Informational representation, thus, considers the (3) as the P P P P P P d push-forward of a sequence of measures 1 = , 2, ··· , T , E η(ω) = η(ω)dP(ω) = η(ω) (ω) dP˜(ω) P dP˜ under a single observable, say u1(ω). This sequence of k ZΩ ZΩ   can be represented through the fluctuating entropy inside the   = η(ω)e−ξ(ω)dP˜(ω) [··· ] below: ZΩ P˜ −1 = E η(ω)e−ξ(ω) . (9a) dFu1 dPk(ω)= u1(ω) dP(ω). (6) dFuk h i   And conversely,  Narratives based on information representation have rich P˜ P varieties. We have discussed above the information E η(ω) = E η(ω)eξ(ω) . (9b) representation cast with a single, common random variable P u   h i u(ω): k(ω) −→ Fk(x). Alternatively, one can cast the In particular, when the η = 1, the log-mean-exponential ∗ information representation with a single, given F (x) on R: of fluctuating ξ is zero. The incarnations of this equality uk ∗ Pk(ω) −→ F (x) for any sequence of uk(ω). Actually have been discovered numerous times in thermodynamics, there is a corresponding sequence of measures Pk, whose such as Zwanzig’s free energy perturbation method [13], push-forward are all F ∗(x), on the real line independent of the Jarzynski-Crooks equality [15, 36], and the Hatano-Sasa k. Then parallel to Eq. 5 we have a schematic: equality [16]. Third, one has an inequality, P uk(ω✲) (ω) Fuk (x) P P ξ(ω) ◗ ln E ξ(ω) ≤ ln E e =0. (10) −1 ◗ (7) u u (ω) dPk dF k ◗1 h i dP (ω)= dF ∗ uk(ω) ◗   ◗ As we have discussed in [5], this inequality is the h i ❄ ◗s mathematical origin of almost all inequalities in connection uk(ω✲) ∗ Pk(ω) F (x) to entropy in thermodynamics and . Fourth, let us again consider a real-valued random variable ∗ If the invariant F (x) is the uniform distribution, e.g., when η(ω), with probability density function fη(x), x ∈ R, and its ∗ one chooses the Lebesgue measure on R with F (x) = x, information, e.g., fluctuating entropy ξ(ω) = − ln fη η(ω) then one has [6]. Then one has a new measure P˜ whose dP˜ ξ(ω), dP (ω) = e  and EP dFuk − ln uk(ω) = − fuk (x) ln fuk (x)dx. dx R    Z dP˜(ω)= eξ(ω)dP(ω)  This is precisely the Shannon entropy! More generally with a Zx1<η(ω)≤x2 Zx1<η(ω)≤x2 ∗ fixed F (x), one has −1 = fη η(ω) dP(ω) x <η(ω)≤x EP dFuk fuk (x) Z 1 2 ln uk(ω) = fuk (x) ln dx x2   dF ∗ R f ∗(x) −1    Z   = fη(y) fη(y)dy  ∗ Zx1 = − fuk (x) ln f (x)dx + fuk (x) ln fuk (x)dx. (8) R R = x2 −x1.  (11) Z Z 3

This means that under the new measure P˜, the random variable The energetic part represents the “changing location”, η(ω) has an unbiased uniform distribution on the R. Note which characterizes movement in the classical dynamic sense: that the measure P˜(ω) is non-normalizable if η(ω) is not a A point to a point. It experiences an “energy” change where −1 bounded function. the internal energy is defined as −β ln fu(x)= ϕ(x). Entropy is the greatest “equalizer” of random variables, as The entropic part represents the resolution for measuring physical observables inevitably biased! information. This is distinctly a feature of dynamics in The forgoing discussion leaves no doubt that entropy (or a continuous space, it is related to the Jacobian matrix in negative free energy) ξ(ω) is a quantity to be used in the form a deterministic transformation: According to the concept of eξ(ω). This clearly points to the origin of partition function of Markov partition developed by Kolmogorov and Sinai computation in . In fact, it is fitting to for chaotic dynamics [18], there is the possibility of call the tangent space in the affine structure of the space of continuous entropy production in dynamics with increasing measures its “space of ” [5]; which represents the “state space fineness”. This term is ultimately related to change of information. Kolmogorov-Sinai entropy and Ruelle’s folding entropy [19]. Entropy and Jacobian matrix. We note that the “information of v under observable u” III. CONFIGURATIONAL ENTROPY IN CLASSICAL dg DYNAMICS − ln f v(ω) + ln v(ω) 6= − ln f v(ω) ! (14) u dx v      By classical dynamics, we mean the representation of This difference precisely reflects the effect of “pull-back” dynamical change in terms of a deterministic mathematical from R to the Ω space, there is a breaking symmetry between description, with either discrete space-time or continuous u(ω) and v(ω) in (13a), when setting x = v(ω). Actually, space-time. The notion of “configurational entropy” arose in the full “information entropy change” associated with the this context in the theories of statistical mechanics, developed deterministic map by L. Boltzmann and J. W. Gibbs, either as Boltzmann’s Wahrscheinlichkeit W , the Jacobian matrix in a deterministic dg − ln f v(ω) + ln f u(ω) = − ln v(ω) , (15) transformations [17] as a non-normalizable density function, v u dx or its cumulative distribution. Boltzmann’s entropy emerges      in connection to macroscopic observables, which are chosen as expected. The rhs is called configurational entropy. Note naturally from the conserved quantities in a microscopic this is an equation between three random variables, i.e., dynamics. fluctuating entropies, that is valid for all ω. For a one-to-one In our present approach, the classical dynamics is a map in Rn, the above |dg(x)/dx| becomes the absolute value deterministic map in the space of observables, Rn. The of the determinant of n × n Jacobian matrix, which has a informational representation demands a description of the paramount importance in the theory of functions, integrations, changevia a change of measures, and we now show the notion and deterministic transformations. The matrix is associated of configurational entropy arises. with an invertible local coordinate transform, yi = gi(x), Information in deterministic change. Let us now consider 1 ≤ i ≤ n, x ∈ Rn: a one-to-one deterministic transformation R → R, which maps the random variable u(ω) to v(ω) = g−1 u(ω) , ∂g1 ··· ∂g1 ∂x1 ∂xn g′(x) > 0. Then D [y1, ··· ,yn] . . .  =  . .. .  , (16) D[x1, ··· , xn] dFv(x) fu(g(x)) ′  ∂gn ∂gn  = g (x). (12)  ···  dFu(x) fu(x)  ∂x1 ∂xn      Applying the result in (1) and (5), the corresponding RN whose determinant is the local “density change”. Classical derivative Hamiltonian dynamics preserves the Lebesgue volume. Measure-preserving transformation. A change with P˜ d ′ information preservation means the lhs of (13a) being zero for − ln P(ω) = ln fu(g(x)) − ln fu(x) + ln g (x) (13a) " d # x=v(ω) all ω ∈ Ω. This implies the function g(x) necessarily satisfies h i dg dg(x) = − ln fu v(ω) + ln v(ω) (13b) − ln f (x) + ln = − ln f g(x) . (17) dx u dx u     information of v under observable u   Eq. 17 is actually the condition for g preserving the measure |− − ln fu u(ω{z) } (13c) Fu(x), with density fu(x), on R [18]:   information of u  dg(x) fu(x)= fu(g(x)) = fv(x), (18a) in which the information| of{zv = g−}1(u), under observable dx x   u(ω), has two distinctly different contributions: energetic part F (x)= f (z)dz = F g(x) . (18b) and entropic part. u u u Z−∞  4

In this case, the inequality in (14) becomes an equality. e.g., the absolute value of the “determinant” of the rectangular In terms of the internal energy ϕ(x), then one has matrix [23] D 1 dg(x) [y1, ··· ,ym] β = ln , (19) det D , (23) ϕ g(x) − ϕ(x) dx [x1, ··· , xn]     which can be computed from the product of the  ∆S which should be heuristically understood as the ratio ∆E singular values of the non-square matrix. Boltzmann’s where ∆S = ln dg(x) − ln dx and ∆E = ϕ g(x) − ϕ(x). Wahrscheinlichkeit, his thermodynamic probability, is when For an infinitesimal change g(x)= x + ε(x), we have m =1. In that case,  ε′(x) d dΣ β = . (20) W (y)= dx = , (24) ′ dy x x k∇g(x)k ϕ (x)ε(x) Zg( )≤y Ig( )=y Entropy balance equation. The expected value of the lhs of in which the integral on rhs is the surface integral on the level (13a) accordingto measure P is non-negative. In fact, consider surface of g(x)= y . the Shannon entropy change associated with Fu˜(x) → Fu(x): One has a further, clear physical interpretation of − ln W (y1, ··· ,ym) asa potential of entropic force [24], with force:

fu˜(x) ln fu˜(x)dx − fu(x) ln fu(x)dx (21a) ∂ ln W 1 ∂W R R = , (1 ≤ ℓ ≤ m). (25) Z Z ∂yℓ W ∂yℓ fu˜ = fu˜(x) ln dx + fu˜(x) − fu(x) ln fu(x)dx In the polymer theory of rubber elasticity, a Gaussian density R fu R Z   Z h i function emerges due to central limit theorem, and Eq. 25 entropy production ∆S(i) entropy exchange ∆S(e) yields a three-dimensional Hookean linear spring [25]. P dP = |E ln {z(ω) } | {z (21b)} dP˜    IV. CONDITIONAL PROBABILITY, MUTUAL P dF dF +E ln u u˜(ω) − ln u u(ω) . (21c) INFORMATION AND FLUCTUATION THEOREM dx dx         This equationin fact has the formof the fundamental equation In addition to describing change by ∆x = x2 − x1, another of nonequilibrium thermodynamics [20, 21]: ∆S = ∆S(i) + more in-depth characterization of a pair of observables ∆S(e). The entropy production ∆S(i) on the rhs is never (x1, x2) is by their functional dependency x2 = g(x1), if any. negative, the entropy exchange ∆S(e) has no definitive sign. If In connection to stochastic change, this leads to the powerful dFu(x) notion of conditional probability, which we now discuss in f (x)= = C is a uniform distribution, then ∆S(e) = u dx terms of the informational representation. 0 and entropy change is the same as entropy production. First, for two random variables (u ,u )(ω), the conditional Contracted description, endomorphism, and matrix 1 2 probability distribution on R × R: volume. We have discussed above the Rn → Rn deterministic invertible transformation y = g(x) and shown that the P y

We note that ξ12(ω) is actually symmetric w.r.t. u1 and u2. in terms of the measure-theoretic information, as suggested The term inside the [··· ] in (27 by some scholars who studied deeply thermodynamics and the concept of entropy [30]. Just as differential

fu1|u2 (x; y) fu ,u (x, y) fu2|u1 (x; y) calculus and Hilbert space providing the necessary language = 1 2 = . (29) fu1 (x) fu1 (x)fu2 (y) fu2 (x) for classical and quantum mechanics, respectively, the Kolmogorovian probability, including the informational In fact the equality ξ12(ω) = ξ21(ω) is the Bayes’ rule. representation, provides many known results in statistical Furthermore, MI(u1,u2) ≥ 0. It has been transformed into physics and with a deeper understanding, such a distance function that satisfies triangle inequality [27, 28]. as phase transition and symmetry breaking [31], Gibbsian In statistics, “distance” is not only measured by their ensemble theory [32], nonequilibrium thermodynamics [35– dissimilarity, but also their statistical dependence. This 38], and the unification of the theories of chemical kinetics realization gives rise to the key notion of independent and and chemical thermodynamics [39]. In fact, symmetry identically distributed (i.i.d.) random variables. In the principle [40] and emergent probability distribution via limit informational representation, this means u1 and u2 have same laws [41] can both be understood as providing legitimate amount of information on the probability space, and they have measures a priori, P(ω), for the physical world or the zero mutual information. biological world. And stochastic kinematics dictates entropic

Finally, but not the least, for Fu1u2 (x, y), one can introduce force and its potential function, the free energy [42]. [53] a F˜ (x, y)= F (y, x). Then entropy production, u1u2 u1u2 For a long time the field of informationtheory and the study

dFu1u2 of large deviations (LD) in probability were not integrated: ξ(ω) = ln (x, y) (30a) A. Ya Khinchin’s book was published the same year as dF˜u u  1 2 x=u1(ω),y=u2(ω) the Sanov theorem [43, 44]. Researchers now are agreed 2 ∂ Fu u (x,y) 1 2 upon that Boltzmann’s original approach to the canonical = ln ∂x∂y , (30b) ∂2Fu u (y,x) equilibrium energy distribution [45], which was based on a  1 2  ∂x∂y maximum entropy argument, is a part of the contraction of x=u1(ω),y=u2(ω)   Sanov theorem in the LD theory [46]. In probability, the satisfies the fluctuation theorem [35, 37]: contraction principle emphasizes the LD rate function for P the sample mean of a random variable.[54] In statistics and {a < ξ(ω) ≤ a + da} a data science, the same mathematics has been used to justify P = e , (31) {−a − da < ξ(ω) ≤−a} the maximum entropy principle which emphasizes a bias, as a conditional probability, introduced by observing a sample for any a ∈ R. Eq. 31 characterizes the statistical asymmetry mean [47]. In all these work, Shannon’s entropy and its between u (ω) and u (ω), not independent in general. Being 1 2 variant relative entropy, as a single numerical characteristic of identical and independent is symmetric, so is a stationary a probability distribution, has a natural and logical role. Many Markov process with reversibility [35]. approaches have been further advanced in applications, e.g., surprisal analysis and maximum caliber principle [48, 49]. V. CONCLUSION AND DISCUSSION The idea that information can be itself a stochastic quantity originated in the work of Tribus, Kolmogorov [8, 9], and The theory of information [26] as a discipline that exists probably many other mathematically minded researchers [50]. outside the field of probability and statistics owes to its In physics, fluctuating entropy and entropy production arose singular emphasis on the notion of entropy as a quantitative in the theory of nonequilibrium stochastic thermodynamics. measure of information. Our theory shows that it is This development has significantly deepened the concept of indeed a unique representation of stochastic change that is entropy, both to physics and as the theory of information. The distinctly different from, but complementary to, the traditional present work further illustrates that the notion of information, mathematical theory of probability. In statistical physics, together with fluctuating entropy, actually originates from there is a growing awareness of a deep relation between the a perspective that is rather different from that of strict notion of thermodynamic free energy and information [29]. Kolmogorovian; with complementarity and contradistinctions. The present work clearly shows that it is possible to narrate the In pure mathematics, the notion of change of measures goes statistical thermodynamics as a subject of theoretical physics back at least to 1940s [51], if not earlier.

[1] Qian, H. (2016) Nonlinear stochastic dynamics of complex [3] Kolmogorov, A. N. and Fomin, S. V. (1968) Introductory Real systems, I: A kinetic perspective Analysis, Silverman, R. A. transl. Dover, New York. with mesoscopic nonequilibrium thermodynamics. [4] Ye, F. X.-F. and Qian, H. (2019) Stochastic dynamics II: Finite arXiv1605.08070. random dynamical systems, linear representation, and entropy [2] Dirac, P. A. M. (1930) The Principles of Quantum Mechanics, production. Disc. Cont. Dyn. Sys. B to appear. Oxford Univ. Press, U.K. [5] Hong, L., Qian, H. and Thompson, L. F. (2019) Representations 6

and metrics in the space of probability measures and stochastic Thermodynamics of information, Nature Phys. 11, 131–139. thermodynamics. J. Comp. Appl. Math. submitted. [30] Ben-Naim, A. (2008) A Farewell to Entropy: Statistical [6] Qian, H. (2001) Mesoscopic nonequilibrium thermodynamics Thermodynamics Based on Information. World Scientific, of single macromolecules and dynamic entropy-energy Singapore. compensation. Phys. Rev. E 65, 016102. [31] Ao, P., Qian, H., Tu, Y., and Wang, J. (2013) A [7] Seifert, U. (2005) Entropy production along a stochastic theory of mesoscopic phenomena: Time scales, emergent trajectory and an integral fluctuation theorem. Phys. Rev. Lett. unpredictability, symmetry breaking and dynamics across 95, 040602. different levels. arXiv:1310.5585. [8] Tribus, M. (1961) Thermostatics and Thermodynamics: An [32] Cheng, Y.-C., Wang, W. and Qian, H. (2018) Gibbsian Introduction to Energy, Information and States of Matter, with statistical ensemble theory as an emergent limit law in Engineering Applications. D. van Nostrand, New York. probability. arXiv:1811.11321. [9] Kolmogorov, A. N. (1968) Three approaches to the quantitative [33] Jaynes, E. T. (2003) Probability Theory: The Logic of Science, definition of information. International Journal of Computer Cambridge Univ. Press, U.K. Mathematics, 2, 157–168. [34] Wang, J. (2015) Landscape and flux theory of nonequilibrium [10] Qian, H. (2017) Information and entropic force: Physical dynamical systems with application to biology. Adv. Phys. 64, description of biological cells, chemical reaction kinetics, and 1–137. information theory (In Chinese). Sci. Sin. Vitae 47, 257–261. [35] Jiang, D. Q., Qian, M. and Qian, M.P. (2004) Mathematical [11] Hobson, A. (1969) A new theorem of information theory. J. Stat. Theory of Nonequilibrium Steady States, Lecture Notes in Phys. 1, 383–391. Mathematics, Vol. 1833. Springer-Verlag, Berlin. [12] Louisell, W. H. (1973) Quantum Statistical Properties of [36] Jarzynski, C. (2011) Equalities and inequalities: Irreversibility Radiation, Wiley Interscience, New York. and the second law of thermodynamics at the nanoscale. Ann. [13] Zwanzig, R. W. (1954) High-temperature equation of state by Rev. Cond. Matt. Phys. 2, 329 (2011). a perturbation method. I. Nonpolar gases. J. Chem. Phys. 22, [37] Seifert, U. (2012) Stochastic thermodynamics, fluctuation 1420–1426. theorems, and molecular machines. Rep. Prog. Phys. 75, [14] Jarzynski, C. (1997) Nonequilibrium equality for free energy 126001 (2012). differences. Phys. Rev. Lett. 78, 2690–2693. [38] van den Broeck, C. and Esposito, M. (2015) Ensemble and [15] Crooks, G. (1999) Entropy production fluctuation theorem and trajectory thermodynamics: A brief introduction. Physica A the nonequilibrium work relation for free energy differences. 418, 6. Phys. Rev. E 60, 2721–2726. [39] Ge, H. and Qian, H. (2016) Mesoscopic kinetic basis [16] Hatano, T and Sasa S.-I. (2001) Steady-state thermodynamics of macroscopic chemical thermodynamics: A mathematical of Langevin systems. Phys. Rev. Lett. 86, 3463–3466. theory. Phys. Rev. E 94, 052150. [17] Gibbs, J. W. (1902) Elementary Principles in Statistical [40] Yang, C. N. (1996) Symmetry and physics. Proc. Am. Philos. Mechanics. Scribner’s Sons, New York. Soc. 140, 267–288. [18] Mackey, M. C. (1989) The dynamic origin of increasing entropy. [41] Anderson, P. W. (1971) More is different. Science 177, Rev. Mod. Phys. 61, 981–1015. 393–396. [19] Ruelle, D. (1996) Positivity of entropy production in [42] Qian, H. (2017) Kinematic basis of emergent energetic nonequilibrium statistical mechanics. J. Stat. Phys. 85, 1–23. descriptions of general stochastic dynamics. arXiv:1704.01828. [20] de Groot, S. R. and Mazur, P. (2011) Non-Equilibrium [43] Khinchin, A. Ya. (1957) Mathematical Foundations of Thermodynamics, Dover, New York. Information Theory, Dover, New York. [21] Qian, H., Kjelstrup, S., Kolomeisky, A. B. and Bedeaux, [44] Sanov, I. N. (1957) On the probability of large deviations of D. (2016) Entropy production in mesoscopic stochastic random variables. Matematicheskii Sbornik 42, 11–44. thermodynamics: Nonequilibrium kinetic cycles driven by [45] Sharp, K. and Matschinsky, F. (2015) Translation of Ludwig chemical potentials, temperatures, and mechanical forces. J. Boltzmann’s paper “On the relationship between the second Phys. Cond. Matt., 28, 153004. fundamental theorem of the mechanical theory of heat and [22] Kutz, J. N. (2013) Data-Driven Modeling & Scientific probability calculations regarding the conditions for thermal Computation. Oxford Univ. Press, U.K. equilibrium”. Entropy 17, 1971–2009. [23] Ben-Israel, A. (1999) The change of variables formula using [46] Touchette, H. (2009) The large deviation approach to statistical matrix volume. SIAM J. Matrix Anal. 21, 300–312. mechanics. Phys. Rep. 478, 1–69. [24] Kirkwood, J. G. (1935) Statistical mechanics of fluid mixtures. [47] Shore, J. E. and Johnson, R. W. (1980) Axiomatic derivation J. Chem. Phys. 3, 300–313. of the principle of maximum entropy and the principle of [25] Hill, T. L. (1987) Statistical Mechanics: Principles and minimum cross-entropy. IEEE Tran. Info. Th. IT-26, 26–37. Selected Applications, Dover, New York. [48] Levine, R. D. (1978) Information theory approach to molecular [26] Cover, T. M. and Thomas, J. A. (1991) Elements of Information reaction dynamics. Annu. Rev. Phys. Chem. 29, 59–92. Theory. John Wiley & Sons, New York. [49] Press´e, S., Ghosh, K., Lee, J. and Dill, K. A. (2013) Principles [27] Dawy, Z., Hagenauer, J., Hanus, P. and Mueller, J. C. (2005) of maximum entropy and maximum caliber in statistical Mutual information based distance measures for classification physics. Rev. Mod. Phys. 85, 1115–1141. and content recognition with applications to genetics. In IEEE [50] Downarowicz, T. (2011) Entropy in Dynamical Systems, International Conf. Communication, vol. 2, pp. 820–824. Cambridge Univ. Press, U.K. [28] Steuer, R., Daub, C. O., Selbig, J. and Kurths, J. (2005) [51] Cameron, R. H. and Martin, W. T. (1944) Transformations of Measuring distances between variables by mutual information. Wiener integrals under translations. Ann. Math. 45, 386–396. In Innovations in Classification, Data Science, and Information Systems. Baier, D. and Wernecke, K. D. eds., Springer, Berlin, pp. 81–90. [29] Parrondo, J. M. R., Horowitz, J. M. and Sagawa, T. (2015) 7

[52] F˜u(x)= appropriate group actions and the notion of invariant measures dFu˜ in a theory of stochastic dynamics as the legitimate prior dP˜(ω) = u(ω) dP(ω) Zu ω x Zu ω x dFu probability [33, 34]. ( )≤ ( )≤  F u [54] Even though the mathematical steps in Boltzmann’s work on x d u˜ ( ) x u d dFu˜(u) ideal gas are precisely these in the LD contraction computation, = h i fu(u)du = du = Fu˜(x). dFu(u) their interpretations are quite different: Total mechanical Z−∞ Z−∞  du  du energy conservation figured prominently in Boltzmann’s work, h i The equality in (1) is restricted on all the B ∈ σ u−1(B) ⊆ which is replaced by the conditioning on a given sample mean F.  in the LD theory. The subtle relation between conservation law [53] There is a remarkable logic consistency between fundamental and conditional probability was discussed in [32]. physics pursuing symmetries in appropriate spaces under