<<

Common probability patterns arise from simple invariances

Steven A. Frank1 Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697–2525 USA Shift and stretch invariance lead to the exponential-Boltzmann probability distribution. Rotational invari- ance generates the Gaussian distribution. Particular scaling relations transform the canonical exponential and Gaussian patterns into the variety of commonly observed patterns. The scaling relations themselves arise from the fundamental invariances of shift, stretch, and , plus a few additional invariances. Prior work described the three fundamental invariances as a consequence of the equilibrium canonical ensemble of statis- tical mechanics or the Jaynesian maximization of information entropy. By contrast, I emphasize the primacy and sufficiency of invariance alone to explain the commonly observed patterns. Primary invariance naturally creates the array of commonly observed scaling relations and associated probability patterns, whereas the classical approaches derived from statistical mechanics or information theory require special assumptions to derive commonly observed scales.

1. Introduction 2 9.1. Statistical mechanics 10 9.2. Jaynesian maximum entropy 10 2. Background 3 9.3. Conclusion 11 2.1. Probability increments 3 2.2. Parametric scaling relations 3 10. References 11

3. Shift invariance and the exponential form 3 Appendix A: Technical issues and extensions 12 3.1. Conserved total probability 3 1. Conserved total probability 12 3.2. Shift-invariant canonical coordinates 4 2. Conserved average values: eqn 7 12 3.3. Entropy as a consequence of shift invariance 4 3. Primacy of invariance 12 3.4. Example: the gamma distribution 4 4. Measurement theory 13

4. Stretch invariance and average values 5 Appendix B: Invariance and the common 4.1. Conserved average values 5 canonical scales 13 4.2. Alternative measures 5 5. Rotational invariance of the base scale 13 4.3. Example: the gamma distribution 5 6. General form of base scale invariance 13 7. Example: linear-log invariance of the base scale 14 5. Consequences of shift and stretch invariance 5 8. The family of canonical scales 14 5.1. Relation between alternative measures 5 9. Example: extreme values 15 5.2. Entropy 5 10. Example: stretched exponential and L´evy 15 5.3. Cumulative measure 6 5.4. Affine invariance and the common scales 6

6. Rotational invariance and the Gaussian radial measure 6 6.1. Gaussian distribution 7 6.2. Radial shift and stretch invariance 7 6.3. Transforming distributions to canonical Gaussian form 7

arXiv:1602.03559v2 [math.PR] 1 Apr 2016 6.4. Example: the gamma distribution 7 6.5. Example: the beta distribution 8

7. Rotational invariance and partitions 8 7.1. Overview 8 7.2. Rotational invariance of conserved probability 8 7.3. Partition of the canonical scale 9 7.4. Partition into location and rate 9

8. Summary of invariances 9

9. The primacy of invariance and 10

a)web: http://stevefrank.org

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 2

It is increasingly clear that the symmetry [invari- What is primary in the relation between these two ance] group of nature is the deepest thing that we equations: equilibrium statistical mechanics or shift in- understand about nature today. I would like to variance? The perspective of statistical mechanics, with suggest something here that I am not really cer- eqn 2 as the primary equilibrium outcome, dominates treatises of physics. tain about but which is at least a possibility: that Jaynes4,5 questioned whether statistical mechanics is specifying the symmetry group of nature may be sufficient to explain why patterns of nature often follow all we need to say about the physical world, be- the form of eqn 2. Jaynes emphasized that the same yond the principles of . probability pattern often arises in situations for which The paradigm for symmetries of nature is of physical theories of particle dynamics make little sense. In Jaynes’ view, if most patterns in economics, biology, course the group symmetries of space and time. and other disciplines follow the same distributional form, These are symmetries that tell you that the laws then that form must arise from principles that transcend of nature don’t care about how you orient your the original physical interpretations of particles, energy, laboratory, or where you locate your laboratory, and statistical mechanics6. or how you set your clocks or how fast your lab- Jaynes argued that probability patterns derive from oratory is moving (Weinberg 1, p. 73). the inevitable tendency for systems to lose information. By that view, the equilibrium form expresses minimum information, or maximum entropy, subject to whatever For the description of processes taking place in constraints may act in particular situations. In maximum nature, one must have a system of reference (Lan- entropy, the shift invariance of the equilibrium distribu- dau and Lifshitz 2, p. 1). tion is a consequence of the maximum loss of information under the constraint that total probability is conserved. Here, I take the view that shift invariance is pri- 1 INTRODUCTION mary. My argument is that shift invariance and the conservation of total probability lead to the exponential- I argue that three simple invariances dominate much of Boltzmann form of probability distributions, without the observed pattern. First, probability patterns arise from need to invoke Boltzmann’s equilibrium statistical me- invariance to a shift in scaled measurements. Second, chanics or Jaynes’ maximization of entropy. Those sec- the scaling of measurements satisfies invariance to uni- ondary special cases of Boltzmann and Jaynes follow from form stretch. Third, commonly observed scales are often primary shift invariance and the conservation of proba- invariant to rotation. bility. The first part of this article develops the primacy Feynman3 described the shift invariant form of proba- of shift invariance. bility patterns as Once one adopts the primacy of shift invariance, one is faced with the interpretation of the measurement scale, q (E) q (E + a) E. We must abandon energy, because we have discarded = , (1) q E0 q E0 + a the primacy of statistical mechanics, and we must aban- don Jaynes’ information, because we have assumed that in which q (E) is the probability associated with a mea- we have only general invariances as our basis. surement, E. Here, the ratio of probabilities for two dif- We can of course end up with notions of energy and ferent measurements, E and E0, is invariant to a shift by information that derive from underlying invariance. But a. Feynman derived this invariant ratio as a consequence that leaves open the problem of how to define the canon- of Boltzmann’s equilibrium distribution of energy levels, ical scale, E, that sets the frame of reference for measure- E, that follows from statistical mechanics ment. We must replace the scaling relation E in the above q (E) = λe−λE . (2) equations by something that derives from deeper gener- ality: the invariances that define the commonly observed Here, λ = 1/ hEi is the inverse of the average measure- scaling relations. ment. In essence, we start with an underlying scale for obser- Feynman presented the second equation as primary, vation, z. We then ask what transformed scale, z 7→ Tz ≡ arising as the equilibrium from the underlying dynamics E, achieves the requisite shift invariance of probability of particles and the consequent distribution of energy, E. pattern, arising from the invariance of total probability. He then mentioned in a footnote that the first equation It must be that shift transformations, Tz 7→ a+Tz, leave of shift invariance follows as a property of equilibrium. the probability pattern invariant, apart from a constant However, one could take the first equation of shift invari- of proportionality. ance as primary. The second equation for the form of the Next, we note that a stretch of the scale, Tz 7→ bTz, probability distribution then follows as a consequence of also leaves the probability pattern unchanged, because shift invariance. the inverse of the average value in eqn 2 becomes λ =

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 3

1/b hTzi, which cancels the stretch in the term λE = λTz. 2.2 Parametric scaling relations Thus, the scale Tz has the property that the associated probability pattern is invariant to the affine transforma- A probability pattern, qzdψz, may be considered as a tion of shift and stretch, T 7→ a + bT . That affine z z parametric description of two scaling relations, qz and ψz, invariance generates the symmetry group of scaling rela- with respect to the parameter z. Geometrically, qzdψz tions that determine the commonly observed probability is a rectangular area defined by the parametric height, patterns7–9. qz, with respect to the parameter, z, and the parametric The final part of this article develops rotational in- width, dψz, with respect to the parameter, z. variance of conserved partitions. For example, the We may think of z as a parameter that defines a curve Pythagorean partition T = x2(s)+y2(s) splits the scaled z along the path (ψz, qz), relating a scaled input measure, measurement into components that add invariantly to T z ψz, to a scaled output probability, qz. The followings sec- for any value of s. The invariant quantity defines√ a circle tions describe how different invariances constrain these in the xy plane with a conserved radius Rz = Tz that scaling relations. is invariant to rotation around the circle, circumscribing 2 a conserved area πRz = πTz. Rotational invariance al- lows one to partition a conserved quantity into additive 3 SHIFT INVARIANCE AND THE EXPONENTIAL FORM components, which often provides insight into underlying process. I show that shift invariance and the conservation of If we can understand these simple shift, stretch, and ro- total probability lead to the exponential form of proba- tational invariances, we will understand much about the bility distributions in eqn 2. Thus, we may consider the intrinsic structure of pattern. An explanation of natu- main conclusions of statistical mechanics and maximum ral pattern often means an explanation of how particular entropy as secondary consequences that follow from the processes lead to particular forms of invariance. primacy of shift invariance and conserved total probabil- ity.

2 BACKGROUND 3.1 Conserved total probability This section introduces basic concepts and notation. I emphasize qualitative aspects rather than detailed math- This section relates shift invariance to the conservation ematics. The final section of this article provides histor- of total probability. Begin by expressing probability in ical background and alternative perspectives. terms of a transformed scale, z 7→ Tz, such that qz = k0f(Tz) and 2.1 Probability increments Z Z qzdψz = k0f(Tz)dψz = 1.

Define q(z) ≡ qz such that the probability associated with z is q ∆ψ . This probability is the area of a rect- The term k0 is independent of z and adjusts to satisfy z z the conservation of total probability. angle with height qz and incremental width ∆ψz. The total probability is constrained to be one, as the If we assume that the functional form f is invariant to sum of the rectangular areas over all values of z, which is a shift of the transformed scale by a constant, a, then by P the conservation of total probability qz∆ψz = 1. When the z values are discrete quantities or qualitative labels for events, then the incremental mea- Z Z sure is sometimes set to one everywhere, ∆ψz ≡ 1, with k0f(Tz)dψz = kaf(Tz + a)dψz = 1. (3) changes in the measure ∆ψz made implicitly by adjusting P qz. The conservation of probability becomes qz = 1. The proportionality constant, ka, is independent of z and If a quantitative scale z has values that are close to- changes with the magnitude of the shift, a, in order to gether, then the incremental widths are small, ∆ψz → satisfy the constraint on total probability. dψz, and the distribution becomes essentially continu- Probability expressions, q(z) ≡ qz, are generally not ous in the limit. The probability around each z value is shift invariant with respect to the scale, z. However, if qz dψz. Writing the limiting sum as a integral over z, the our transformed scale, z 7→ Tz is such that we can write R conservation of total probability is qz dψz = 1. eqn 3 for any magnitude of shift, a, solely by adjusting The increments may be constant-sized steps dψz = the constant, ka, then the fact that the conservation of dz on the z scale, with probabilities qzdψz = qzdz in total probability sets the adjustment for ka means that each increment. One may transform z in ways that alter the condition for Tz to be a shift invariant canonical scale the probability expression, qz, or the incremental widths, for probability is dψz, and study how those changes alter or leave invariant qz = k0f(Tz) = kaf(Tz + a), (4) properties associated with the total probability, qzdψz. which holds over the entire domain of z.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 4

The key point here is that ka is an adjustable param- This logarithmic expression of probability leads to var- eter, independent of z, that is set by the conservation of ious classical definitions of entropy and information3,10. total probability. Thus, the conservation of total proba- Here, the linear relation between the logarithmic scale bility means that we are only required to consider shift and the canonical scale follows from the shift invariance invariance in relation to the proportionality constant ka of probability with respect to the canonical scale, Tz, and that changes with the magnitude of the shift, a, indepen- the conservation of total probability. dently of the value of z. Appendix A provides additional I interpret shift invariance and the conservation of to- detail about the conservation of total probability and the tal probability as primary aspects of probability patterns. shift-invariant exponential form. Entropy and information interpretations follow as sec- ondary consequences. One can of course derive shift invariance from physi- 3.2 Shift-invariant canonical coordinates cal or information theory perspectives. My only point is that such extrinsic concepts are unnecessary. One can This section shows the equivalence between shift in- begin directly with shift invariance and the conservation variance and the exponential form for probability distri- of total probability. butions. Let x ≡ Tz, so that we can write the shift invariance of f in eqn 4 as 3.4 Example: the gamma distribution

f(x + a) = αaf(x). Many commonly observed patterns follow the gamma probability distribution, which may be written as By the conservation of total probability, αa depends only on a and is independent of x. q = kzαλe−λz. If the invariance holds for any shift, a, then it must z hold for an infinitesimal shift, a = . By Taylor series, This distribution is not shift invariant with respect to z, we can write because z 7→ a + z alters the pattern 0 f(x + ) = f(x) + f (x) = αf(x). αλ −λz αλ −λ(a+z) qz = kz e 6= ka(a + z) e . Because  is small and independent of x, and α = 1, we 0 There is no value of k for which this expression holds can write α = 1−λ for a constant λ. Then the previous a  for all z. equation becomes If we write the distribution in canonical form f 0(x) = −λf(x). −λTz −λ(z−α log z) qz = ke = ke , (6) This differential equation has the solution then the distribution becomes shift invariant on the −λx f(x) = keˆ , canonical scale, Tz = z − α log z, because Tz 7→ a + Tz yields in which kˆ may be determined by an additional con- −λ(a+Tz ) −λTz straint. Using this general property for shift invariant f qz = ke = kae , in eqn 4, we obtain the classical exponential-Boltzmann −λa form for probability distributions in eqn 2 as with ka = ke . Thus, a shift by a leaves the pattern unchanged apart from an adjustment to the constant of −λTz qz = ke (5) proportionality that is set by the conservation of total probability. with respect to the canonical scale, Tz. Thus, express- The canonical scale, Tz = z − α log z, is log-linear. It ing observations on the canonical shift-invariant scale, is purely logarithmic for small z, purely linear for large z 7→ Tz, leads to the classical exponential form. If one z, and transitions between the log and linear domains accepts the primacy of invariance, the “energy,” E, of through a region determined by the parameter α. the Boltzmann form in eqn 2 arises as a particular in- The interpretation of process in relation to pattern al- terpretation of the generalized shift-invariant canonical most always reduces to understanding the nature of in- coordinates, Tz. variance. In this case, shift invariance associates with log- linear scaling. To understand the gamma pattern, one must understand how process creates a log-linear scaling 3.3 Entropy as a consequence of shift invariance relation that is shift invariant with respect to probability pattern7–9. The transformation to obtain the shift-invariant coor- dinate Tz follows from eqn 5 as

− log qz = λTz − log k.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 5

4 STRETCH INVARIANCE AND AVERAGE VALUES 4.3 Example: the gamma distribution

4.1 Conserved average values The gamma distribution from the prior section pro- vides an example. If we transform the base scale by a Stretch invariance means that multiplying the canon- stretch factor, z 7→ bz, then ical scale by a constant, T 7→ bT , does not change z z q = k e−λ(bz−α log bz). probability pattern. This condition for stretch invariance z b associates with the invariance of the average value. There is no altered value of λ for which this expression To begin, note that for the incremental measure dψz = leaves qz invariant over all z. By contrast, if we stretch dTz, the constant in eqn 5 to satisfy the conservation of with respect to the canonical scale, Tz 7→ bTz, in which total probability is k = λ, because Tz = z − α log z for the gamma distribution, we obtain

Z ∞ −λbbTz −λTz −λTz qz = ke = ke λe dTz = 1, 0 for λb = λ/b. Thus, if we assume that the distribution is stretch invariant with respect to dz, then the average when integrating over Tz. Next, define hXi as the average value of X with re- value λ hTiz = λ hz − α log zi is a conserved quantity. ψ Alternatively, if we assume that the average value spect to the incremental measure dψz. Then the average of λTz with respect to dTz is λ hTiz = λ hz − α log zi = λ hzi − λα hlog zi Z 2 −λTz is a conserved quantity, then stretch invariance of the λ hTi = λ Tz e dTz = 1. (7) T canonical scale follows. In this example of the gamma distribution, conserva- The parameter λ must satisfy the equality. This invari- tion of the average value with respect to the canonical ance of λ hTiT implies that any stretch transformation scale is associated with conservation of a linear combina- Tz 7→ bTz will be canceled by λ 7→ λ/b. See Appendix A tion of the arithmetic mean, hzi, and the geometric mean, for further details. hlog zi, with respect to the underlying values, z. In sta- We may consider stretch invariance as a primary at- tistical theory, one would say that the arithmetic and tribute that leads to the invariance of the average value, geometric means are sufficient statistics for the gamma λ hTiT. Or we may consider invariance of the average distribution. value as a primary attribute that leads to stretch invari- ance. 5 CONSEQUENCES OF SHIFT AND STRETCH INVARIANCE

4.2 Alternative measures 5.1 Relation between alternative measures

Stretch invariance holds with respect to alternative We can relate alternative measures to the canonical measures, dψ 6= dT . Note that for q in eqn 5, the z z z scale by dT = T0dψ , in which T0 = |dT /dψ| is the conservation of total probability fixes the value of k, be- z z z absolute value of the rate of change of the canonical scale cause we must have with respect to the alternative scale. Starting with eqn 7 0 Z and substituting dTz = T dψz, we have ke−λTz dψ = 1. z Z 0 2 0 −λTz λ hTT iψ = λ TzT e dψz = 1. The average value of λTz with respect to dψz is Z Thus, we recover a universally conserved quantity with −λTz respect to any valid alternative measure, dψ . λTzke dψz = λ hTiψ . z

Here, we do not have any guaranteed value of λ hTiψ, 5.2 Entropy because it will vary with the choice of the measure dψz. If we assume that hTiψ is a conserved quantity, then λ must be chosen to satisfy that constraint, and, from the Entropy is defined as the average value of − log qz. From the canonical form of qz in eqn 2, we have fact that λTz occurs as a pair, λ hTiψ is a conserved quantity. The conservation of λ hTiψ leads to stretch − log qz = λTz − log k. (8) invariance, as in the prior section. Equivalently, stretch invariance leads to the conservation of the average value. Average values depend on the incremental measure, dψz, so we may write entropy11 as

h− log qziψ = hλTz − log kiψ = λ hTiψ − log kψ.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 6

The value of log kψ is set by the conservation of total for some constants a and b. We can abbreviate this notion probability, and λ is set by stretch invariance. The value of affine invariance as of hTiψ varies according to the measure dψz. Thus, the entropy is simply an expression of the average value of T ◦ G ∼ T, (9) the canonical scale, T, with respect to some incremental measurement scale, ψ, adjusted by a term for the conser- in which “∼” means affine invariance in the sense of vation of total probability, k. equivalence for some constants a and b. We can apply the transformation G to both sides of When ψ ≡ T, then kψ = λ, and we have the classic result for the exponential distribution eqn 9, yielding the new invariance T ◦ G ◦ G ∼ T ◦ G. In general, we can apply the transformation G repeatedly to each side any number of times, so that h− log qziT = λ hTiT − log λ = 1 − log λ = log e/λ, T ◦ G n ∼ T ◦ G m in which the conserved value λ hTiT = 1 was given in eqn 7 as a consequence of stretch invariance. for any nonnegative integers n and m. Repeated applica- tion of G generates a group of invariances—a symmetry 5.3 Cumulative measure group. Often, in practical application, the base invari- ance in eqn 9 does not hold, but asymptotic invariance

Shift and stretch invariance lead to an interesting rela- T ◦ G(n+1) ∼ T ◦ G n tion between − log qz and the scale at which probability accumulates. From eqn 8, we have holds for large n. Asymptotic invariance is a key aspect of pattern12. 1 − d log q = dT = T0dψ . λ z z z 6 ROTATIONAL INVARIANCE AND THE GAUSSIAN RADIAL Multiplying both sides by qz, the accumulation of prob- MEASURE ability with each increment of the associated measure is The following sections provide a derivation of the 1 0 − qz d log qz = qzdTz = qzT dψz. Gaussian form and some examples. This section high- λ lights a few results before turning to the derivation. The logarithmic form for the cumulative measure of prob- Rotational invariance transforms the total probabil- ability simplifies to ity qzdTz from the canonical exponential form into the canonical Gaussian form 1 1 2 2 − qz d log qz = − dqz = qzdTz. −λTz −πv R λ λ λe dTz 7→ ve z dRz. (10)

This expression connects the probability weighting, qz, This transformation follows from the substitution λTz 7→ for each incremental measure, to the rate at which prob- 2 2 πv Rz, in which the stretch invariant canonical scale, ability accumulates in each increment, dq = −λq dT . 2 2 z z z λTz, becomes the stretch invariant circular area, πv Rz, This special relation follows from the expression for q in 2 2 z with squared radius v Rz. The new incremental scale, eqn 2, arising from shift and stretch invariance and the vdRz, is the stretch invariant Gaussian radial measure. consequent canonical exponential form. We can, without loss of generality, let v = 1, and write 2 Λ = πRz as the area of a circle. Thus the canonical Gaussian form 5.4 Affine invariance and the common scales −Λ qzdψz = e dRz (11) Probability patterns are invariant to shift and stretch describes the probability, − log qz = Λ, in terms of the of the canonical scale, Tz. Thus, affine transforma- area of a circle, Λ, and the incremental measurement tions Tz 7→ a + bTz define a group of related canoni- scale, dψz, in terms of the radial increments, dRz. cal scales. In previous work, we showed that essentially 3 all commonly observed probability patterns arise from a Feynman noted the relation between entropy, radial simple affine group of canonical scales7–9. This section measure, and circular area. In my notation, that relation briefly summarizes the concept of affine invariant canon- may be summarized as ical scales. Appendix B provides some examples. h− log q i = hΛi . A canonical scale T(z) ≡ T is affine invariant to a z R R transformation G(z) if However, Feynman considered the circular expression of entropy as a consequence of the underlying notion of sta- T[G(z)] = a + bT(z) tistical mechanics. Thus, his derivation followed from an underlying canonical ensemble of particles

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 7

By contrast, my framework derives from primary un- hRiR = 0, when evaluated over all positive and negative derlying invariances. An underlying invariance of rota- radial values. Shift invariance associates with no change tion leads to the natural Gaussian expression of circular in radial distance as the frame of reference shifts the lo- scaling. To understand how rotational invariance leads cation of the center of the circle to maintain constant to the Gaussian form, it is useful consider a second para- radii. metric input dimension, θ, that describes the angle of Stretch invariance associates with the conserved value rotation13. Invariance with respect to rotation means of the average circular area that the probability pattern that relates q(z, θ) to ψ(z, θ) 1 is invariant to the angle of rotation. λ hTi = πv2 R2 = πv2σ2 = , R R 2 in which the variance, σ2, is traditionally defined as the 6.1 Gaussian distribution average of the squared deviations from the central loca- tion. Here, we have squared radial deviations from the I now show that rotational invariance transforms the center of the circle averaged over the incremental radial canonical shift and stretch invariant exponential form measure, dRz. into the Gaussian form, as in eqn 10. To begin, express When λ = v2 = 1, we have σ2 = 1/2π, and we obtain the incremental measure in terms of the Gaussian radial the elegant expression of the Gaussian as the relation measure as between circular area and radial increments in eqn 11. 2 2 2 This result corresponds to an average circular area of λdTz = πv dRz = 2πv RzdRz, one, because 2πR2 = 2πσ2 = 1. from which the canonical exponential form qzdTz = It is common to express the Gaussian in the standard −λTz λe dTz may be expressed in terms of the radial mea- normal form, with σ2 = 1, which yields v2 = 1/2π, and sure as the associated probability expression obtained by substi- 2 2 −λTz 2 −πv R tuting this value into eqn 10. λe dTz = 2πv Rze z dRz. (12) Rotational invariance means that for each radial in- crement, vdRz, the total probability in that increment 6.3 Transforming distributions to canonical Gaussian form given in eqn 12 is spread uniformly over the circumfer- ence 2πvRz of the circle at radius vRz from a central Rotational invariance transforms the canonical expo- location. nential form into√ the Gaussian form, as in eqn 10. If Uniformity over the circumference implies that we can 2 we equate Rz = Tz and λ = πv , we can write the define a unit of incremental length along the circumfer- Gaussian form as ential path with a fraction 1/2πvR of the total probabil- z r ity in the circumferential shell of width vdR . Thus, the λ z −λTz p qzdRz = e d Tz, (13) probability along an increment vdRz of a radial vector π follows the Gaussian distribution in which −πv2R2 (1/2πvRz) qzdTz = ve z dRz σ˜2 = hTi√ invariantly of the angle of orientation of the radial vector. T Here, the total probability of the original exponential is a generalized notion of the variance. form, qzdTz, is spread evenly over the two-dimensional The expression in eqn 13 may require a shift of Tz so parameter space (z, θ) that includes all rotational orien- that Tz ∈ (0, ∞), with associated radial values Rz = tations. The Gaussian expression describes the distribu- √ ± Tz. The nature of the required shift is most easily tion of probability along each radial vector, in which a shown by example. vector intersects a constant-sized area of each circumfer- ential shell independently of distance from the origin. The Gaussian distribution varies over all positive and 6.4 Example: the gamma distribution negative values, Rz ∈ (−∞, ∞), corresponding to an ini- tial exponential distribution in squared radii, R2 = T ∈ z z The gamma distribution may be expressed as q dψ (0, ∞). We can think of radial vectors as taking positive z z with respect to the parameter z when we set T = z − or negative values according to their orientation in the z α log z and dψ = dz, yielding upper or lower half planes. z −λ(z−α log z) qzdz = ke dz,

6.2 Radial shift and stretch invariance for z ≥ 0. To transform this expression to the Gaussian radial scale, we must shift Tz so that the corresponding The radial value, Rz, describes distance from the cen- value of Rz describes a monotonically increasing radial tral location. Thus, the average radial value is zero, distance from a central location.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 8

For the gamma distribution, if we use the shift Tz 7→ 7.1 Overview Tz − α = (z − α log z) − α for α ≥ 0, then the minimum of Tz and the associated maximum of qz correspond to Conserved quantities may arise from an underlying Rz = 0, which is what we need to transform into the combination of processes. For example, we might know Gaussian form. In particular, the parametric plot of the that a conserved quantity, R2 = x + y, arises as the sum points (±Rz, qz) with respect to the parameter z ∈ (0, ∞) of two underlying processes with values x and y. We follows the Gaussian pattern. do not know x and y, only that their conserved sum is In addition, the parametric plot of the points (Tz, qz) invariantly equal to R2. follows the exponential-Boltzmann pattern. Thus we The partition of an invariant quantity into a sum may have a parametric description of the probability pattern be interpreted as rotational invariance, because qz in terms of three alternative scaling relations for the √ 2 √ underlying parameter z: the measure dz corresponds to R2 = x + y = x + y2 the value of z itself and the gamma pattern, the measure dRz corresponds to the Gaussian radial measure, and the defines a circle with conserved radius R along the positive √ √  measure dTz corresponds to the logarithmic scaling of qz and negative values of the coordinates x, y . That and the exponential-Boltzmann pattern. Each measure form of rotational invariance explains much of observed expresses particular invariances of scale. pattern, many of the classical results in probability and dynamics, and the expression of those results in the con- text of mechanics. 6.5 Example: the beta distribution The partition can be extended to a multidimensional sphere of radius R as A common form of the beta distribution is 2 X √ 2 α−1 β−1 R = xi . (15) qzdz = kz (1 − z) dz for z ∈ (0, 1). We can express this distribution in canon- One can think of rotational invariance in two different ical exponential form ke−λTz by the scaling relation ways. First, one may start with a variety of different di- mensions, with no conservation in any particular dimen- −λTz = (α − 1) log z + (β − 1) log(1 − z), sion. However, the aggregate may satisfy a conserved with λ > 0. For α and β both greater than one, this total that imposes rotational invariance among the com- scaling defines a log-linear-log pattern8, in the sense that ponents. Second, every conserved quantity can be partitioned −λTz scales logarithmically near the endpoints of zero and one, and transitions to a linear scaling interiorly near into various additive components. That partition starts with a conserved quantity and then, by adding dimen- the minimum of Tz at sions that satisfy the total conservation, one induces a α − 1 z∗ = . (14) higher dimensional rotational invariance. Thus, every α + β − 2 conserved quantity associates with higher-dimensional rotational invariance. When 0 < α < 1, the minimum (extremum) of Tz is at z∗ = 0. For our purposes, it is useful to let α = λ for λ > 0, and assume β > 1. ∗ ∗ ∗ 7.2 Rotational invariance of conserved probability Define T as the value of Tz evaluated at z . Thus T is the minimum value of Tz, and Tz increases monoton- ically from its minimum. If we shift Tz by its minimum, In the probability expression qzdψz, suppose the in- ∗ Tz 7→ Tz −T , and use the shifted value of Tz, we obtain cremental measure dψz is constant, and we have a finite the three standard forms of a distribution in terms of the number of values of z with positive probability. We may P parameter z ∈ (0, 1), as follows. write the conserved total probability as z qz = 1. Then The measure dz and parametric plot (z, qz) is the stan- from eqn 15, we can write the conservation of total prob- 2 dard beta distribution form, the measure dRz and para- ability as a partition of R = 1 confined to the surface of metric plot (±Rz, qz) is the standard Gaussian form, and a multidimensional sphere the measure dT and parametric plot (T , q ) is the stan- z z z X √ 2 dard exponential-Boltzmann form. qz = 1. z There is a natural square root spherical coordinate sys- 7 ROTATIONAL INVARIANCE AND PARTITIONS √ tem, qz, in which to express conserved probability. Square roots of probabilities arise in a variety of funda- The Gaussian radial measure often reveals the further mental expressions of physics, statistics, and probability underlying invariances that shape pattern. Those invari- theory14,15. ances appear from the natural way in which the radial measure can be partitioned into additive components.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 9

7.3 Partition of the canonical scale The Hamiltonian interpretation is, however, particu- larly useful. It leads to a natural expression of dynamics The canonical scale equals the square of the Gaus- with respect to underlying invariance. For example, we 2 can partition a probability pattern into its currently ob- sian radial scale, Tz = Rz. Thus, we can write a two- dimensional partition from eqn 15 as servable location and its rate of change

2 2 √ 2 √ 2 −λHz −λw −λw˙ Tz = x1 + x2 . e = e e . Define the two dimensions as The first component, w2, may be interpreted as the ob- √ servable state of the probability pattern at a particular x1 = w ≡ w(z, s) time. The second component,w ˙ 2, may be interpreted as √ x2 =w ˙ ≡ w˙ (z, s), the rate of change in the probability pattern. Invariance applies to the combination of location and rate of change, yielding the partition for the canonical scale as rather than to either component alone. Thus, invariance 2 2 does not imply equilibrium. Tz = w +w ˙ . (16) This expression takes the input parameter z and parti- 2 8 SUMMARY OF INVARIANCES tions the resulting value of Tz = Rz into a circle of radius Rz along the path (w, w˙ ) traced by the parameter s. Probability patterns, qz, express invariances of shift The radial distance, Rz, and associated canonical scale value, T = R2, are invariant with respect to s. In gen- and stretch with respect to a canonical scale, Tz. Those z z invariances lead to an exponential form eral, for each dimension we add to a partition of Tz, we can create an additional invariance with respect to a new q dψ = ke−λTz dψ , parameter. z z z

with respect to various incremental measures, dψz. This probability expression may be regarded parametrically 7.4 Partition into location and rate with respect to z. The parametric view splits the proba- bility pattern into two scaling relations, qz and ψz, with A common partition separates the radius into dimen- respect to z, forming the parametric curve defined by the sions of location and rate. Definew ˙ = ∂w/∂s as the points (ψz, qz). rate of change in the location w with respect to the pa- For the canonical scale, Tz, we may consider the sorts rameter s. Then we can use the notational equivalence of transformations that leave the scale shift and stretch 2 Hz ≡ Tz = Rz to emphasize the relation to a classic (affine) invariant, T ◦ G ∼ T, as in eqn 9. Essentially all expression in physics for a conserved Hamiltonian as of the canonical scales of common probability patterns7–9 arise from the affine invariance of T and a few simple H = w2 +w ˙ 2, (17) z types of underlying invariance with respect to z. in which this conserved square of the radial distance is For the incremental measure scale, dψz, four alter- partitioned into the sum of a squared location, w2, and natives highlight different aspects of probability pattern a squared rate of change in location,w ˙ 2. The squared and scale. rate, or velocity, arises as a geometric consequence of The scale dz leads to the traditional expression of prob- the Pythagorean partitioning of a squared radial distance ability pattern, qzdz, which highlights the invariances into squared component dimensions. Many extensions of that set the canonical scale, Tz. this Hamiltonian interpretation can be found in standard The scale dTz leads to the universal exponential- textbooks of physics. Boltzmann form, qzdTz, which highlights the fundamen- tal shift and stretch invariances in relation to the conser- With the Hamiltonian notation, Hz ≡ Tz, our canoni- cal exponential-Boltzmann distribution is vation of total probability. This conservation of total probability may alterna- −λHz qzdHz = λe dHz. tively be described by a cumulative probability measure, dqz = −λqzdTz. The value H is often interpreted as energy, with dH as Finally, rotational invariance leads to the Gaussian the Gibbs measure. For the simple circular partition of radial measure, dRz. That radial measure transforms 2 eqn 17, the total energy is often split into potential, w , many probability scalings, qz, into Gaussian distribu- 2 and kinetic,w ˙ , components. tions, qzdRz. In this article, I emphasize the underlying invariances Invariances typically associate with conserved and their geometric relations as fundamental. From my quantities16. For example, the rotational invariance perspective, the interpretation of energy and its compo- of the Gaussian radial measure is equivalent to the nents are simply one way in which to describe the funda- conservation of the average area circumscribed by the mental invariances.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 10 radial measure. That average circular area is pro- underlying microscopic ensemble. portional to the traditional definition of the variance. For example, to derive the log-linear scaling pattern Thus, rotational invariance and conserved variance are that characterizes the commonly observed gamma dis- equivalent in the Gaussian form. tribution in eqn 6, a mechanical perspective must make The Gaussian radial measure often reveals the further special assumptions about the interactions between the underlying invariances that shape pattern. That insight underlying microscopic particles. follows from the natural way in which the radial measure Some may consider the demand for explicit mechanical can be partitioned into additive components. assumptions about the underlying particles to be a bene- fit. But in practice, those explicit assumptions are almost certainly false, and instead simply serve as a method by 9 THE PRIMACY OF INVARIANCE AND SYMMETRY which to point in the direction of the deeper underlying invariance that shapes the scaling relations and associ- It was Einstein who radically changed the way ated probability patterns. people thought about nature, moving away I prefer to start with the deeper abstract structure from the mechanical viewpoint of the nine- shaped by the key invariances. Then one may consider teenth century toward the elegant contempla- the variety of different particular mechanical assumptions tion of the underlying symmetry principles of that lead to the key invariances. Each set of particular the laws of physics in the twentieth century assumptions that are consistent with the key invariances (Lederman and Hill 17, p. 153). define a special case. There have been many powerful extensions to statis- The exponential-Boltzmann distribution in eqn 2 pro- tical mechanics in recent years. Examples include gen- vides the basis for statistical mechanics, Jaynesian maxi- eralized entropies based on assumptions about underly- mum entropy, and my own invariance framework. These ing particle mechanics18, superstatistics as the average approaches derive the exponential form from different as- over heterogeneous microscopic sets19, and invariance sumptions. The underlying assumptions determine how principles applied to the mechanical aspects of particle far one may extend the exponential-Boltzmann form to- interactions20. ward explaining the variety of commonly observed pat- My own invariance and scaling approach subsumes terns. essentially all of those results in a simple and elegant I claim that one must begin solely with the fundamen- way, and goes much further with regard to providing tal invariances in order to develop a proper understand- a systematic understanding of the commonly observed ing of the full range of common patterns. By contrast, patterns7–9. However, it remains a matter of opinion statistical mechanics and Jaynesian maximum entropy whether an underlying mechanical framework based on begin from particular assumptions that only partially re- an explicit microscopic ensemble is better or worse than flect the deeper underlying invariances. a more abstract approach based purely on invariances.

9.1 Statistical mechanics 9.2 Jaynesian maximum entropy

Statistical mechanics typically begins with an as- Jaynes4,5 replaced the old microscopic ensemble of par- sumed, unseen ensemble of microscopic particles. Each ticles and the associated mechanical entropy with a new particle is often regarded as identical in nature to the oth- information entropy. He showed that maximum entropy, ers. Statistical averages over the underlying microscopic in the sense of information rather particle mechanics, ensemble lead to a macroscopic distribution of measur- leads to the classic exponential-Boltzmann form. A large able quantities. The exponential-Boltzmann distribution literature extends the Jaynesian framework21. Axiomatic is the basic equilibrium macroscopic probability pattern. approaches transcend the original justifications based on In contrast with the mechanical perspective of sta- intuitive notions of information22. tistical physics, my approach begins with fundamen- Jaynes’ exponential form has a kind of canonical scale, tal underlying invariances (symmetries). Both ap- Tz. In Jaynes’ approach, one sets the average value over proaches arrive at roughly the same intermediate point the canonical scale to a fixed value, in our notation a of the exponential-Boltzmann form. That canonical form fixed value of hTiz. That conserved average value defines expresses essentially the same invariances, no matter a constraint—an invariance—that determines the associ- whether one begins with an underlying mechanical per- ated probability pattern23. The Jaynesian algorithm is spective or an underlying invariance perspective. the maximization of entropy, subject to a constraint on From my point of view, the underlying mechanical per- the average value of some quantity, Tz. spective happens to be one particular way in which to un- Jaynes struggled to go beyond the standard constraints cover the basic invariances that shape pattern. But the of the mean or the variance. Those constraints arise from 2 mechanical perspective has limitations associated with fixing the average values of Tz = z or Tz = z , which lead the unnecessarily particular assumptions made about the to the associated exponential or Gaussian forms. Jaynes

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 11 did discuss a variety of additional invariances6 and asso- ACKNOWLEDGMENTS ciated probability patterns. But he never achieved any systematic understanding of the common invariances and National Science Foundation grant DEB–1251035 sup- the associated commonly observed patterns and their re- ports my research. I began this work while on fellowship lations. at the Wissenschaftskolleg zu Berlin. I regarded Jaynes’ transcendence of the particle-based microscopic ensemble as a strong move in the right di- rection. I followed that direction for several years7–9,12. In my prior work, I developed the intrinsic affine in- 10 References variance of the canonical scale, Tz, with respect to the exponential-Boltzmann distribution of maximum en- 1S. Weinberg, “Towards the final laws of physics,” in Elementary tropy. The recognition of that general affine invariance Particles and the Laws of Physics, edited by R. P. Feynman and plus the variety of common invariances of scale24,25 led S. Weinberg (Cambridge Univ. Press, Cambridge, UK, 1999). to my systematic classification of the common probability 2L. D. Landau and E. M. Lifshitz, Mechanics, 4th ed., Course in 7–9 Theoretical Physics, Vol. 2 (Butterworth-Heinemann, London, patterns and their relationships . 1980). In this article, I have taken the next step by doing away 3R. P. Feynman, Statistical Mechanics: A Set Of Lectures, 2nd with the Jaynesian maximization of entropy. I replaced ed. (Westview Press, New York, 1998). that maximization with the fundamental invariances of 4E. T. Jaynes, “Information theory and statistical mechanics,” shift and stretch, from which I obtained the canonical Phys. Rev. 106, 620–630 (1957). 5E. T. Jaynes, “Information theory and statistical mechanics. II,” exponential-Boltzmann form. Phys. Rev. 108, 171–190 (1957). With the exponential-Boltzmann distribution derived 6E. T. Jaynes, Probability Theory: The Logic of Science (Cam- from shift and stretch invariance rather than Jaynesian bridge University Press, New York, 2003). maximum entropy, I added my prior work on the gen- 7S. A. Frank and E. Smith, “Measurement invariance, entropy, eral affine invariance of the canonical scale and the ad- and probability,” Entropy 12, 289–303 (2010). 8S. A. Frank and E. Smith, “A simple derivation and classification ditional particular invariances that define the common of common probability distributions based on information sym- scaling relations and probability patterns. We now have metry and measurement scale.” Journal of Evolutionary Biology a complete system based purely on invariances. 24, 469–484 (2011). 9S. A. Frank, “How to read probability distributions as statements about process,” Entropy 16, 6059–6098 (2014). 10T. M. Cover and J. A. Thomas, Elements of Information Theory 9.3 Conclusion (Wiley, New York, 1991). 11Wikipedia, “Partition (mathematics) — Wikipedia, The Shift and stretch invariance set the exponential- Free Encyclopedia,” (2015), [Online; accessed 19-December- Boltzmann form of probability patterns. Rotational 2015]. 12S. A. Frank, “The common patterns of nature,” Journal of Evo- invariance transforms the exponential pattern into the lutionary Biology 22, 1563–1585 (2009). Gaussian pattern. These fundamental forms define the 13W. Bryc, The Normal Distribution: Characterizations and Ap- abstract structure of pattern with respect to a canonical plications (Springer-Verlag, New York, 1995). scale. 14B. R. Frieden, Science from Fisher Information: A Unification (Cambridge University Press, Cambridge, UK, 2004). In a particular application, observable pattern arises 15S. A. Frank, “d’Alembert’s direct and inertial forces acting on by the scaling relation between the natural measurements populations: The Price equation and the fundamental theorem of that application and the canonical scale. The partic- of natural selection,” Entropy 17, 7087–7100 (2015). ular scaling relation derives from the universal affine in- 16D. E. Neuenschwander, Emmy Noether’s Wonderful Theorem variance of the canonical scale and from the additional (Johns Hopkins University Press Press, Baltimore, 2010). 17L. M. Lederman and C. T. Hill, Symmetry and the Beautiful invariances that arise in the particular application. Universe (Prometheus Books, Amherst, N.Y., 2004). Together, these invariances define the commonly ob- 18C. Tsallis, Introduction to Nonextensive Statistical Mechanics served scaling relations and associated probability pat- (Springer, New York, 2009). terns. The study of pattern often reduces to the study 19C. Beck and E. Cohen, “Superstatistics,” Physica A: Statistical of how particular generative processes set the particular Mechanics and its Applications 322, 267–275 (2003). 20R. Hanel, S. Thurner, and M. Gell-Mann, “Generalized entropies invariances that define scale. and the transformation group of superstatistics,” Proceeding of Diverse and seemingly unrelated generative processes the National Academy of Sciences USA 108, 6390–6394 (2011). may reduce to the same simple invariance, and thus to 21S. Presse, K. Ghosh, J. Lee, and K. A. Dill, “Principles of max- the same scaling relation and associated pattern. To test imum entropy and maximum caliber in statistical physics,” Re- hypotheses about generative process and to understand views of Modern Physics 85, 1115–1141 (2013). 22J. Shore and R. Johnson, “Axiomatic derivation of the princi- the diversity of natural pattern, one must understand ple of maximum entropy and the principle of minimum cross- the central role of invariance. Although that message entropy,” IEEE Transactions on Information Theory 26, 26–37 has been repeated many times, it has yet to be fully de- (1980). ciphered. 23S. A. Frank, “Maximum entropy production from d’Alembert’s separation of direct and inertial forces,” arXiv:1506.07312 (2015).

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 12

24D. J. Hand, Measurement Theory and Practice (Arnold, London, of primary invariance. There are many other ways of un- 2004). derstanding the fact that the foundational exponential- 25 R. D. Luce and L. Narens, “Measurement, theory of,” in The Boltzmann distribution expresses shift and stretch invari- New Palgrave Dictionary of Economics, edited by S. N. Durlauf and L. E. Blume (Palgrave Macmillan, Basingstoke, 2008). ance, and the Gaussian distribution expresses rotational 26B. V. Gnedenko and A. N. Kolmogorov, Limit Distributions for invariance. One can derive those invariances from other Sums of Independent Random Variables (Addison-Wesley, Read- assumptions, rather than assume that they are primary. ing, MA, 1968). Classical statistical mechanics derives shift and stretch 27 U. Grenander, Elements of pattern theory (Johns Hopkins Uni- invariance as consequences of the aggregate behavior of versity Press, Baltimore, 1996). many particles. Jaynesian maximum entropy derives shift and stretch invariance as consequences of the ten- APPENDIX A: TECHNICAL ISSUES AND EXTENSIONS dency for entropy to increase plus the assumptions that total probability is conserved and that the average value 1 Conserved total probability of some measurement is conserved. In my notation, the conservation of hλTzi is equivalent to the assumption of stretch invariance. Often, this kind of assumption is sim- The relations between shift invariance and the conser- ilar to various conservation assumptions, such as the con- vation of total probability in Section 3 3.1 form a core servation of energy. part of the article. Here, I clarify the particular goals, Another way to derive invariance is by the classic limit assumptions, and consequences. theorems of probability. Gnedenko and Kolmogorov 26 In Section 3 3.1, I assumed that the conservation of to- beautifully summarized a key aspect: tal probability and shift invariance hold. From those as- sumptions, eqn 4 follows, and thus also the exponential- In fact, all epistemologic value of the theory of Boltzmann form of eqn 5. probability is based on this: that large-scale I am not claiming that conservation of total probability random phenomena in their collective by itself leads to shift invariance. Instead, my goal is create strict, nonrandom regularity. to consider the consequences that follow from a primary assumption of shift invariance. The limit theorems typically derive from assumptions The justification for a primary assumption of invari- such as the summation of many independent random ance remains an open problem at the foundation of much components, or in more complicated studies, the aggre- of modern physics. The opening quote from Weinberg gation of partially correlated random components. From expresses the key role of invariances and also the uncer- those assumptions, certain invariances may arise as con- tainty about why invariances are fundamental. My only sequences. goal concerns the consequences that follow from the as- It may seem that the derivation of invariances from sumption of primary invariances. more concrete assumptions provides a better approach. But from a mathematical and perhaps ultimate point of view, invariance is often tautologically related to sup- 2 Conserved average values: eqn 7 posedly more concrete assumptions. For example, con- servation of energy typically arises as an assumption in many profound physical theories. In those theories, one Below eqn 7, I stated that the average value λ hTiT = 1 remains unchanged after stretch transformation, T 7→ could chose to say that stretch invariance arises from con- z servation of energy or, equivalently, that conservation of bTz. This section provides additional details. The prob- lem begins with eqn 7, repeated here energy arises from stretch invariance. It is not at all clear how we can know which is primary, because mathemati- Z 2 −λTz cally they are often effectively the same assumption. λ hTi = λ Tz e dTz = 1. T My point of departure is the opening quote from Wein-

Make the substitution Tz 7→ bTz, which yields berg, who based his statement on the overwhelming suc- Z cess of 20th century physics. That success has partly 2 2 −λbTz (mostly?) been driven by studying the consequences that λb hTiT = λ b Tz e dTz = 1, follow from assuming various primary invariances. The noting that Tz 7→ bTz implies dTz 7→ bdTz, which ex- ultimate basis for those primary invariances remains un- plains the origin of the b2 term on the right-hand side. clear, but the profoundly successful consequences of pro- Thus, eqn 7 remains one under stretch transformation, ceeding in this way are very clear. These issues are very implying that hTiT = 1/λb. important. However, a proper discussion would require probing the basis of modern physics as well as many deep recent developments in mathematics, which is beyond my 3 Primacy of invariance scope. I simply wanted to analyze what would follow from the assumption of a few simple primary invariances. This article assumes the primacy of shift and stretch invariance. The article then develops the consequences

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 13

4 Measurement theory APPENDIX B: INVARIANCE AND THE COMMON CANONICAL SCALES Classical measurement theory develops a ratio- nal approach to derive and understand measurement The variety of canonical scales may be understood by scales24,25. Roughly speaking, a measurement scale is the variety of invariances that hold under different cir- defined by the transformations that leave invariant the cumstances. I introduced the affine invariance of the relevant relations of the measurement process. Different canonical scale in eqn 9. This section briefly summarizes approaches develop that general notion of invariance in further aspects of invariance and the common canonical different ways or expand into broader aspects of pattern scales. Prior publications provide more detail7–9. (e.g, Grenander 27 ). Invariance can be studied by partition of the trans- This article concerns probability patterns in relation to formation, z 7→ Tz, into two steps, z 7→ w 7→ Tz. The scale. The key is that probability patterns remain invari- first transformation expresses intrinsic invariances by the ant to affine transformation, that is, to shift and stretch transformation z 7→ w(z), in which w defines the new transformations. Thus different measurement scales lead base scale consistent with the intrinsic invariances. to the same invariant probability pattern if they are affine The second transformation evaluates only the canon- similar. I discussed the role of affine similarity in sev- ical shift and stretch invariances in relation to the base eral recent articles7–9. Here, I briefly highlight the main scale, w 7→ a+bw. This affine transformation of the base points. scale can be written as T(w) = a + bw. We can define Start with some notation. Let T(z) ≡ T be a transfor- T(w) ≡ Tz, noting that w is a function of z. mation of underlying observations z that define a scale, T. Each scale T has the property of being invariant to certain alterations of the underlying observations. Let 5 Rotational invariance of the base scale a candidate alteration of the underlying observation be the generator, G(z) ≡ G. Invariance of the scale T to the Rotational invariance is perhaps the most common generator G means that base scale symmetry. In the simplest case, w(z) = z2. If we write x = z cos θ and y = z sin θ, then x2 + y2 = z2, T [G(z)] = T(z), and the points (x, y) trace a circle with a radius z that is rotationally invariant to the angle θ. Many probability which we can write in simpler notation as distributions arise from rotationally invariant base scales, which is why squared values are so common in probabil- T ◦ G = T. 2 ity patterns. For example, if w = z and Tz ≡ w, then the canonical exponential form that follows from shift Sometimes we do not require exact invariance, but only and stretch invariance of the rotationally invariant base a kind of similarity. In the case of probability patterns, scale is shift and stretch invariance mean that any two scales −λw −λz2 related by affine transformation T = a + bT yield the qz = ke = ke , same probability pattern. In other words, probability patterns are invariant to affine transformations of scale. which is the Gaussian distribution, as discussed in the Thus, with regard to the generator G, we only require text. that T ◦ G fall within a family of affine transformation Note that the word rotation captures an invariance of T. Thus, we write the conditions for two probability that transcends a purely angular interpretation. Instead, patterns to be invariant to the generator G as we have component processes or measurements that sat- isfy an additive invariance constraint. For each final T ◦ G = a + bT ∼ T, value, z, there exist a variety of underlying processes or P 2 2 outcomes that satisfy the invariance xi = z . and thus the key invariance relation for probability pat- The word rotation simply refers to the diversity of un- terns is affine similarity expressed as derlying Pythagorean partitions that sum to an invariant Euclidean distance. The set of invariant partitions falls T ◦ G ∼ T, on the surface of a sphere. That spherical property leads to the expression of invariant additive partitions in terms which was presented in the text as eqn 9. My prior pub- of rotation. lications fully developed this relation of affine similarity and its consequences for the variety of scales that de- 7–9 fine the commonly observed probability patterns . Ap- 6 General form of base scale invariance pendix B briefly presents a few examples, including the linear-log scale. The earlier sections established that the canonical scale of probability patterns is invariant to shift and stretch. Thus we may consider as equivalent any affine transfor- mation of the base scale w 7→ a + bw.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 14

We may describe additional invariances of w, such as to an exponential-Boltzmann distribution for small z and rotational invariance, in the general form a power law distribution in the upper tail for large z. We can combine base scales. For example, if we start w ◦ G ∼ w, (18) 2 with w1, a rotationally invariant scale, z 7→ z , and then transform those rotationally invariant values to a linear- in which w ◦ G ≡ w [G(z)]. We read eqn 18 as: the base log scale, w , we obtain w [w (z)] = α log(1+βz2). This scale w is invariant to transformation by G, such that 2 2 1 scale corresponds to the generalized Student’s distribu- w ◦ G = a + bw for some constants a and b. The symbol tion “∼” abbreviates the affine invariance of w. For example, we may express the rotational invariance 2 −γ qz = k(1 + βz ) . of the prior section as

2 2 2 2 For small magnitudes of z, this distribution is linear in w(z, θ) = z (cos θ + sin θ) = z , scale and Gaussian in shape. For large magnitudes of z, because cos2 θ + sin2 θ = 1 for any value of θ. We can this distribution has power law tails. Thus, a rotationally describe rotation by the transformation invariant linear-log scale grades from Gaussian to power law as magnitude increases. G(z, θ) = (z, θ + ), so that the invariance expression is 8 The family of canonical scales

w ◦ G = w [G(z, θ)] = w(z, θ + ) = z2. The canonical scale, Tz, determines the associated −λTz Thus, the base scale w is affine invariant to the rotational probability pattern, qz = ke . What determines the transformation generator, G, as in eqn 18. Although this canonical scale? The answer has two parts. form of rotational invariance seems trivial in this context, First, each problem begins with a base scale, w(z) ≡ w. it turns out to be the basis for many classical results in The base scale arises from the invariances that define the probability, dynamics, and statistical mechanics. particular problem. Those invariances may come from observation or by assumption. The prior sections gave the examples of rotational invariance, associated with 7 Example: linear-log invariance of the base scale squared-value scaling, and linear to power-law invari- ance, associated with linear to log scaling. When the base scale lacks intrinsic invariance, we may write w ≡ z. The invariance expression of eqn 18 sets the conditions Earlier publications provided examples of common base for base scale invariances. Although there are many pos- scales7–9. sible base scales, a few dominate the commonly observed Second, the canonical scale arises by transformation of patterns7–9. In this article, I emphasize the principles the base scale, T = T(w). The canonical scale must sat- of invariance rather than a full discussion of the various z isfy both the shift and stretch invariance requirements. common scales. If the base scale itself satisfies both invariances, then the Earlier, I discussed the log-linear scale associated with base scale is the canonical scale, T = w. In particular, if the gamma distribution. This section presents the inverse z the probability pattern remains invariant to affine trans- linear-log scale, which is formations of the base scale w 7→ δ + γw, then the shift w(z) = α log(1 + βz). and stretch invariant distribution has the form −λw When βz is small, w is approximately αβz, which is linear qz = ke . (19) in z. When βz is large, w is approximately α log(βz), Alternatively, w may satisfy the shift invariance require- which is logarithmic in z. This linear-log scale is affine 8,9 invariant to transformations ment, but fail the stretch invariance requirement . We therefore need to find a canonical transformation T(w) (1 + βz)α − 1 that achieves affine invariance with respect to the under- G(z) = , β lying shift, G(w) = δ + w. The transformation

βw because w ◦ G = αw ∼ w. The transformation, G, is Tz = T(w) = e (20) linear for small magnitudes of z and power law for large magnitudes of z. changes a shift invariance of w into a stretch invariance The linear-log base scale, w, yields the probability dis- of Tz, because tribution T(δ + w) = eβ(δ+w) = eβδeβw = bT ∼ T −λw −γ qz = ke = k(1 + βz) , for b = eβδ. We can write T(δ + w) = T ◦ G, thus for γ = λα. This expression is the commonly observed this expression shows that we have satisfied the affine Lomax or Pareto type II distribution, which is equivalent invariance T ◦ G ∼ T of eqn 9.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank 15

Thus, shift invariance with respect to w generates a This important distribution arises in various contexts9, family of scaling relations described by the parameter β. including the stretched exponential distribution and the The one parameter family of canonical scales in eqn 20 Fourier domain spectral distribution that associates with expands the canonical exponential form for probability the basic L´evydistributions12. distributions to In this case, the probability pattern is not shift and

βw stretch invariant to changes in the value of z, because −λTz −λe qz = ke = ke . (21) z 7→ δ + γz changes the pattern. By contrast, if we start with the base scale w = log z, then the probability The simpler form of eqn 19 arises as a limiting case for pattern is shift and stretch invariant with respect to the β → 0. That limiting form corresponds to the case in canonical scale which the base scale, w, is itself both shift and stretch 8,9 invariant . Thus, we may consider the more familiar βw β exponential form as falling within the expanded one pa- Tz = e = z , rameter symmetry group of scaling relations in eqn 20. The expanded canonical form for probability patterns because the affine transformation of the canonical scale, β β in eqn 21 and a few simple base scales, w, include essen- z 7→ δ + γz , does not alter the probability pattern in tially all of the commonly observed continuous probabil- eqn 22, given that we adjust k and λ to satisfy the con- ity patterns8,9. servation of probability and the conservation of average value. The way in which I presented these invariances may 9 Example: extreme values seem trivial. If we begin with eqn 22, then of course we have shift and stretch invariance with respect to β β In some cases, it is useful to consider the probability z 7→ δ + γz . However, in practical application, we may begin with an observed pattern and then try to in- pattern in terms of the canonical scale measure, dTz = 0 βw fer its structure. In that case, analysis of the observations |T |dz. Using Tz = e , distributions take on the form often found in the extreme value problems8,9 would lead to the conclusion of shift and stretch invari- ance with respect to the canonical power law scaling, zβ. 0 βw−λeβw Alternatively, we may begin with a theory that in- qzdz = kw e dz, cludes a complicated interaction of various dynamical in which w0 = |dw/dz|. For example, w = z yields the processes. We may then ask what invariance property Gumbel distribution, and w = log z yields the Fr´echet or matches the likely outcome of those processes. The con- Weibull form. clusion may be that, asymptotically, shift and stretch invariance hold with respect to zβ 7→ δ + γzβ, suggesting the power law form of the canonical scale. 10 Example: stretched exponential and L´evy In general, the particular invariant canonical scale de- rives from observations or from assumptions about pro- Suppose the base scale is logarithmic, w(z) = log z. cess. The theory here shows the ways in which ba- Then from eqn 21, a candidate form for probability pat- sic required invariances strongly constrain the candidate tern is canonical scales. Those generic constraints shape the commonly observed patterns independently of the spe- −λzβ qz = ke . (22) cial attributes of each problem.

git • master @ arXiv2-0-e7d119f-2016-04-01 (2016-04-05 00:57Z) • safrank