Information Theory for Fields

September 5, 2018

Information theory for ﬁelds

Torsten A. Enßlin∗

combine different forms of information, both from mea- A physical field has an infinite number of degrees of surements and also from prior knowledge and assump- freedom since it has a field value at each location of a tions. continuous space. Therefore, it is impossible to know This article introduces into IFT with the aim to put a field from finite measurements alone and prior infor- the reader into a position to follow the more specialized mation on the field is essential for field inference. An literature on IFT, to understand and use existing IFT meth- information theory for fields is needed to join the mea- ods, or to develop those adapted to his or her own needs. Basic knowledge of the reader on field theory and thermo- surement and prior information into probabilistic state- dynamics is assumed, although, most concepts used are ments on field configurations. Such an information field briefly introduced. theory (IFT) is built upon the language of mathematical physics, in particular on field theory and statistical mechanics. IFT permits the mathematical derivation 1.2 Structure of the work of optimal imaging algorithms, data analysis methods, and even computer simulation schemes. The applica- The structure of the work is the following. In this Sect. 1, tion of IFT algorithms to astronomical datasets provides the problem of field inference is approached with Bayesian high fidelity images of the Universe and facilitates the probability theory. In Sect. 2, the free theory of IFT is developed using an illustrative example of a diffusion process search for subtle statistical signals from the Big Bang. driven by white noise and observed at some point in time. The concepts of IFT might even pave the road to novel In Sect. 3, this is then extended into the non-linear regime, computer simulations that are aware of their own uncer- leading to interacting IFT in wich the field estimate is not tainties. a linear function of the data any more. Sect. 4 discusses a number of existing and envisaged IFT algorithms and their numerical implementation. An outlook to future de- velopments in IFT are given in the final Sect. 5 1 Information field theory

1.1 Aim 1.3 Physical ﬁelds

Information field theory (IFT) [1–4] is information theory Physical fields like the electromagnetic field, the gravi- for fields. It uses methods from quantum and statistical tational field, or even macroscopic ones like the atmo- field theories for the inference of field properties from spheric temperature generally exhibit different values at data. IFT permits to design optimal algorithm to recover different locations in space and time. Knowing the configuration of a field, i.e. knowing the field values at all arXiv:1804.03350v2 [astro-ph.CO] 1 Sep 2018 unknown or partially known signal fields from noisy, incomplete, and otherwise corrupted measurements given locations, can be of tremendous scientific, technologi- that certain statistical knowledge on the physical nature cal or economical value. Unfortunately, this is impossible of the field is available. in general, as there are more locations to be known (of- To this end, IFT uses Bayesian probability theory for ten infinitely many) than available measurement data fields. Bayesian probabilities quantify how much one should believe in a statement being true, a condition being met, or an event to occur. They are bookkeeping de- ∗ Corresponding author E-mail: ensslin@mpa- vices for evidences. They are not necessarily expressing garching.mpg.de physical frequencies of events. In IFT, logarithmic prob- Max Planck Institute for Astrophysics, Karl-Schwarzschild-Str. abilities will turn out to be the currency to convert and 1, 85741 Garching, Germany

Figure 1 Usage of Feynman diagrams in IFT. An estimator for the level of non-Gaussianity of a ﬁeld is displayed on top of temperature ﬂuctuations of the CMB, to which this estimator can be applied to study the Early Universe. Dots at the end of lines represent the information source j provided by the data, lines the information propagator D, and internal vertices with three or four lines cubic and quartic terms of the information Hamiltonian in Eq. 103, respectively.

that provide mathematical constraints on the possible And how should this extra information on smoothness be field configurations. Thus, inferring a field is an under- inserted into the field inference? constrained problem in general. For a given dataset obtained by an instrument probing the field, there can be an infinite number of potential field configurations that 1.4 Smoothness and correlations are fully consistent with the data. How to choose among them? The first question requires a specification of the concept of smoothness. A field can be regarded to be smooth, if Many of those configurations will be implausible, field values at nearby locations are similar. Thus, knowing e.g. imagine a radio telescope observing the cosmic mi- the field at one location implies some constraints about crowave background (CMB). The telescope is a mea- the possible values at nearby positions. Such knowledge surement device that integrates the field over some sub- is best represented in terms of correlations between the volume of space and provides this integral. Then the inte- different locations. Smoothness of a field ϕ is therefore grated field in this volume is known, but how this amount well characterized by the two point correlation of integrated field is distributed among the different locations is not determined. Among all configurations that Φ(x, y) ϕ(x)ϕ(y) (ϕ) (1) obey this measurement constraint, very rough ones (i.e. = 〈 〉 ones that exhibit completely different values at the dif- of locations x and y, assuming for simplicity here that the ferent locations) are possible and outnumber the also field is fluctuating around zero such that its expectation possible smooth ones largely. There are just many more value vanishes, ϕ (ϕ) 0. 〈 〉 = R ways for a field to be jagged than ways to be smooth. We denote with f (a,b) (a b) da f (a,b)P (a b) the 〈 〉 | = | Nevertheless, for most physical fields, the smooth field expectation of f (a,b) averaged over the conditional prob- configurations would be regarded as far more plausible. ability density P (a b) for the variable a given b. We post- | Rapid spatial changes of a field are less likely as they either pone the question how such integrals are calculated for cost energy (think of an electrical potential field) or are a being a field. We denote probabilities with P [0,1], ∈ erased rapidly by the field dynamics (think of tempera- and probability densities with P [0, ). In the following, ∈ ∞ ture jumps in a turbulent or diffusive atmosphere). Thus, we often drop the word density for briefness. Thus, the among all configurations consistent with the data, smooth expectation value in Eq. 1 should be performed over all ones should be favored. But how smooth should they be? plausible field configurations ϕ weighted with their ap-

2 Copyright line will be provided by the publisher September 5, 2018

propriate probability densities P (ϕ). Such probabilities 1.5 Bayes and statistical physics will be constructed later on. Bayes’ theorem conveniently answers this question. As- For fields that have a spatially homogeneous statis- sume we are interested in a field ϕ : Ω R, which is de- tics, this correlation function is only a function of the dif- → fined on some u-dimensional space Ω Ru and we have ference of the locations, and additionally for statistically ⊆ obtained some measurement data isotropic fields, it depends only on the absolute distance r x y , Φ(x, y) C ( x y ), with C (r ) the radial two- = | − | = ϕ | − | ϕ d R(ϕ) n (3) point correlation function of the field ϕ. For example, the = + density field of matter in the cosmos on large scales is on it, where R(ϕ) d (d ϕ) is the deterministic response = 〈 〉 | assumed to be the result of a statistically isotropic and of the instrument to the field, and n d R(ϕ) denotes = − homogeneous random process. For this reason, the CMB any stochastic contribution to the outcome, called the shows temperature fluctuations with isotropic correlation measurement noise. For example, a linearly integrating statistics (see Fig. 1). instrument is described by Z The correlation function Cϕ(r ) reflects the roughness Ri (ϕ) dx ri (x)ϕ(x) (R ϕ)i , (4) of a statistical homogeneous and isotropic field. It is max- = Ω ≡ imal at zero lag r x y and falls towards zero typically = − where i indexes the different measurements and r (x) how after a correlation length i those respond to the field values at different locations x. The measurement equation, Eq. 3, in combination Z ∞ Cϕ(r ) with the noise statistics P (n ϕ) for a given field configu- λ dr , (2) | = 0 Cϕ(0) ration ϕ, defines the likelihood to obtain the data d, Z P (d ϕ) Dn δ(d R(ϕ) n)P (n ϕ) meaning that field values at locations separated by r λ | = − − | À are largely uncorrelated. On scales smaller than λ field val- P (n d R(ϕ) ϕ). (5) = = − | ues are strongly correlated and the field appears smooth This likelihood, in combination with the data, contains on such distances. Knowledge of the value of a correlated the full information the measurement is providing. A like- field at some location therefore makes a statement about lihood function, which is as accurate as possible, is essen- the field values at nearby locations, within the correlation tial to extract the full information content of a data set. length. They should be similar, in particular, if they are In case some properties of the measurement process are closer by. not precisely known, their determination has to be made The knowledge and usage of such correlations is essen- part of the inference problem. This can turn an otherwise tial for IFT, as it provides an infinite number of statements simple linear signal inference task into a complex instru- on the field, connecting all pairs of points in a statistical mental self-calibration problem. For the moment, such manner. This helps to tame the infinite number of degrees complications are assumed to be absent. of freedom of an unknown field. Traditional smoothness Since the data is only providing finite dimensional regularization, like the suppression of strong gradients constraints on the many possible field configurations, or curvature of the field, are naturally included in this whereas the signal field has infinite number of degrees of language, as we will see in Sect. 2.2. freedom, it is not sufficient to fully specify the field. The information from the data has to be combined with prior A great effort in an IFT analysis of a signal inference knowledge, for example on the field correlations as en- problem should therefore go into determining the cor- coded in C (r ), to provide a posterior probability density relation structure of the field. Ideally, it is derived from ϕ on the field P (ϕ d) (given the data and other informa- a physical line of reasoning, as being presented for the | tion), from which field estimates could be obtained. One example developed later on in in Sect. 2.1. In case such such estimate would be the posterior mean field prior information on the field correlation is not available, it might be extracted from the data itself, as discussed in Z m ϕ (ϕ d) DϕP (ϕ d)ϕ. (6) Sect. 2.8. = 〈 〉 | = | Assuming for now that we have some usable knowl- Here, an integration over all field configurations is neces- edge on the field smoothness the second question re- sary, a technicality we postpone until Sect. 1.8. mains: How to incorporate this knowledge into the in- Bayes’ theorem states that this posterior P (ϕ d), | ference? which summarizes our knowledge on the field after the

Figure 2 The gamma-ray sky as seen by the Fermi satellite. Left half: original photon counts, intensity is encoded in the brightness, photon energy in the color (red 1 GeV, blue 100 GeV). Right half: diffuse emission component reconstructed by ∼ ∼ the D3PO algorithm, which takes out the shot noise and the point sources.

data is taken, is proportional to the product of the data the negative logarithm of the joint probability of data and likelihood P (d ϕ) for a given field configuration and its field.2 The partition function | prior probability P (ϕ), which summarizes our a-priori Z knowledge on the field (e.g. smooth configurations are Z (d) Dϕ exp( H (d,ϕ)) (11) = − more plausible than rough ones). Bayes’ theorem reads P (d ϕ)P (ϕ) is just another name for the evidence P (d). P (ϕ d) | , (7) The statistical mechanics version of Bayes theorem, | = P (d) Eq. 9, is a mere rewriting. The introduced information where the normalization of the posterior is provided by Hamiltonians (aka negative log-probabilities, surprises, the so-called evidence or simply information) has the nice property that infor- Z mation appears as an additive quantity, since P (d) DϕP (d ϕ)P (ϕ). (8) = | H (d,ϕ) H (d ϕ) H (ϕ) (12) In other words, the posterior is the joint probability = | + of data and field P (d,ϕ) P (d ϕ)P (ϕ) evaluated for = | is a direct consequence of the product rule of probabilities, the observed data d dobs, and normalized such that it (d,ϕ) (d ϕ) (ϕ). In particular the fusion of the = R P P P becomes a proper probability density gain, DϕP (ϕ d = | | = information provided by two independent measurements, dobs) 1. with the joint data vector d (d ,d ), is simply performed = = 1 2 Formally, Bayes theorem can be brought into a form by addition of their information Hamiltonians, resembling the Boltzmann distribution of statistical mechanics,1 H (d ϕ) H (d ϕ) H (d ϕ). (13) | = 1| + 2| exp( H (d,ϕ)) P (ϕ d) − , (9) The rewriting of Bayes’ theorem for fields in the lan- | = Z (d) guage of statical mechanics does not solve the problem to where we have introduced the information Hamiltonian extract useful statements form the field posterior P (ϕ d). | However, it makes use of the many methods, which have H (d,ϕ) lnP (d,ϕ), (10) = −

1 The temperature usually appearing in thermodynamics has 2 The dimension of the probability density are ignored in this been set to T 1 here. deﬁnition, as they have no inﬂuence on any result in IFT. =

4 Copyright line will be provided by the publisher September 5, 2018

been developed to treat statistical and quantum field theo- moments are related, for example the posterior field dis- ries. Consequently, we call the treatment of field inference persion 3 problems in this language information field theory. 2 ¯ (c) ∂ lnZ (d, j) ¯ D(x, y) ϕ(x)ϕ(y) (ϕ d) ¯ = 〈 〉 | = ∂j ∗(x)∂j ∗(y)¯j 0 = µ ¶¯ 1.6 Expectation values ∂ 1 ∂Z (d, j) ¯ ¯ = ∂j ∗(x) Z (d, j) ∂j ∗(y) ¯j 0 One of the strengths of the field theoretical language is = 2 ¯ that it provides tools to calculate expectation values. For 1 ∂ Z 1 ∂Z ∂Z ¯ 2 ¯ example, the partition function can be turned into a mo- = Z ∂j ∗(x)∂j ∗(y) − Z ∂j ∗(x) ∂j ∗(y)¯j 0 = ment generating functional by extending it to ϕ(x)ϕ(y) (ϕ d) ϕ(y) (ϕ d) ϕ(y) (ϕ d) = 〈 〉 | − 〉 | 〉 | Z † [ϕ(x) m(x)][ϕ(y) m(y)] , (19) Z (d, j) Dϕ exp( H (d,ϕ) j ϕ), (14) (ϕ d) = − + = 〈 − − 〉 | is the second field moment corrected for the contribution where j is a moment generating source term and of the first moment to it. Z Of particular interest in practical applications are the † j ϕ dx j ∗(x)ϕ(x) (15) first and second connected posterior moments of the field = Ω that represent the posterior mean field m and the uncer- is the usual scalar product for functions. Any posterior tainty covariance D around this mean. The mean is the 2 expectation value of moments of field values can be ob- best guess for the field under the L -error loss function 2 2 † tained from the partition function via L ϕ m (ϕ d) (ϕ m) (ϕ m) (ϕ d) as a short = 〈|| − || 〉 | ≡ 〈 − − 〉 | calculation verifies: n ¯ 2 1 ∂ Z (d, j) ¯ ∂L ϕ(x1)...ϕ(xn) (ϕ d) ¯ . 2 m ϕ (ϕ d) 0 m ϕ (ϕ d). (20) | | 〈 〉 | = Z (d, j) ∂j ∗(x1)...∂j ∗(xn)¯j 0 ∂m = 〈 − 〉 = ⇒ = 〈 〉 = (16) 1.7 Maximum entropy The generation of the posterior mean field m(x) ϕ(x) (ϕ d) Let us assume for a moment that the mean m ϕ (ϕ d) = 〈 〉 | † (c) = 〈 †〉 | ¯ ¯ and the covariance D ϕϕ (ϕ m)(ϕ m) (ϕ d) 1 ∂Z (d, j)¯ ∂lnZ (d, j)¯ = 〈 〉(ϕ d) ≡ 〈 − − 〉 | ¯ ¯ (17) is all what is known about a field,| i.e. the data are this = Z (d, j) ∂j ∗(x) ¯j 0 = ∂j ∗(x) ¯j 0 = = mean and covariance, d 0 (m, D). Which probability dis- = suggests the introduction of the so-called cummulant or tribution should then be used to represent the knowledge connected moments via state? n ¯ According to the maximum entropy principle [6], the (c) ∂ lnZ (d, j) ¯ ϕ(x1)...ϕ(xn) (ϕ d) ¯ . (18) knowledge provided by this data is best summarized by 〈 〉 | = ∂j ∗(x1)...∂j ∗(xn)¯j 0 = a Gaussian probability distribution with this mean and The logarithmic partition function lnZ (d, j), which ap- covariance: pears hereby, is known to be given by the sum of all singly P (ϕ d 0) G (ϕ m, D) | = − connected Feynman diagrams without free vertices (see · ¸ 1 1 † 1 Fig. 1 for such diagrams). This is the reason for the term exp (ϕ m) D− (ϕ m) (21) ≡ p 2πD −2 − − connected moments. The connected and unconnected | | This is one of the lines of reasoning that is the basis of the frequent appearance of Gaussian processes4 in IFT.

3 The term Bayesian field theory was proposed originally by Lemm [5] for field inference. This term, however, does not follow the tradition to name field theories after subjects and 4 A Gaussian process generates field configurations in such a not people. We do not talk about Maxwell, Einstein, or Feyn- way that the joint probability function of any set of field values man field theory, but about electromagnetic, gravitational, and evaluated at a set of distinct location is a multivariate Gaus- quantum field theories. sian.

Gaussian distributions can appear as minimal informative into field estimates, for which the superposition principle descriptions of field knowledge, only the first and second holds. moments are needed to be known to specify those. Imagine we observe at some time t 0 relative tem- = The usage of a Gaussian field prior therefore does not perature fluctuations necessarily imply that the field is following Gaussian statis- δT (x,t) tics. Just the knowledge on the field statistics might be so ϕ(x,t) 1 (22) = ¿ limited that more elaborate priors can not be justified by T the requirement to encode only the least amount of infor- of a temperature field T (x,t) that is defined over one spa- mation consistent with all constraints into probabilities, tial and one temporal dimension, Ω R2, with the spatial = the principle of maximum entropy. We recall, in Bayesian and temporal mean temperature T T (x,t) . These = 〈 〉(x,t) probability theory probabilities express knowledge on a fluctuations are excited by some external Gaussian excita- quantity, not necessary the real, physical frequency of that tions ξ - G (ξ,Ξ) and decay by diffusion: quantity. If, however, the latter is known, it is also the cor- ← 2 1/2 rect probability to be used. We will encounter such a case ∂t ϕ(x,t) κ∂ ϕ(x,t) ν ξ(x,t), (23) = x + in Sec. 2.1. with κ the spatial diffusion coefficient and ν being a rate5. Fourier transforming6 this equation with respect to space and time yields 1.8 Infinities ν1/2ξ(k,ω) ϕ(k,ω) . (24) So far we have operated with fields, linear operators on = i ω κk2 1 + fields like R or Φ− , and integrals over field configurations For simplicity, we assume the excitation to be white and as if they were finite dimensional vectors, matrices, and of unit variance, Ξ δ(x x0)δ(t t 0) or briefly integrals. One can imagine to perform such operations (x,t)(x0,t 0) = − − Ξ 1. This implies that in Fourier space the excitation is with pixelized versions of fields, operators, and the like, = 2 white as well, Ξ (2π) δ(k k0)δ(ω ω0). The with the pixelization taken so fine that further refinements (k,ω)(k0,ω0) = − − field covariance in Fourier space is then would not make any difference to calculated expectation values anymore. In order to approach the realm of fields, Φ(k,ω)(k ,ω ) ϕ(k,ω) ϕ∗ (ϕ) 0 0 = 〈 (k0,ω0)〉 the limit of an infinite number of pixels has to be taken. (2π)2 ν Clearly, some quantities used above become infinite or δ(k k0)δ(ω ω0). (25) = ω2 κ2k4 − − vanish in this limit, like the determinant of a covariance D + Assume at time t 0 we obtain some data on the field in Eq. 21. However, these exploding terms are usually aux- = iliary expressions, ensuring the normalization of a prob- according to a linear measurement as described by Eqs. 3 ability distribution (with respect to a given pixelization) and 4, with Gaussian, field-independent measurement 2 and are not of fundamental interest. What needs to stay noise n - G (n,N) that is white (N σn 1) and has vari- 2 ← = finite in this limit are observables, the expectation values ance σn. Thus, the likelihood is † of field properties f (ϕ) (ϕ d). Here, f (ϕ) r ϕ might be 〈 〉 | = P (d ϕ) G (d R ϕ,N). (26) the integration of the field over some probing kernel r (x), | = − or the like. In case the field expectation of interest are ro- To simplify the discussion further, we assume that the bust with respect to further refinements of the pixelized measurement probes every location, such that ri (x) field representation, the continuum limit of a field theory = can be claimed to effectively be reached.

5 The external excitation fluctuations should add up under time 2 Free fields integration in quadrature, and if ξ is as dimensionless as ϕ, 1 then [ν] sec− . 6 = 2.1 Simplistic scenario We use the (one dimensional) Fourier convention ϕk R ikx R ikx = dx e ϕ , ϕ dk/(2π)e− ϕ . Therefore, the identity x x = k To have an instructive problem, let us discuss the simplest operator in Fourier space reads 1 2πδ(k k0). Further- kk0 = − IFT scenario of a free field first. Free means that the infor- more, we use arguments and indices interchangeably and to mation Hamiltonian will be only be of quadratic order in denote whether a quantity is in real space or Fourier space: R the field, leading to a linear operation translating the data ϕ ϕ(x) and ϕ ϕ(k) dx eikx ϕ(x). x = k = =

6 Copyright line will be provided by the publisher September 5, 2018

δ(i x) or put differently R 1. This is an idealization, as with H (d ϕ) lnP (d ϕ) the likelihood information. − = | = − | now an infinite number of locations are probed. Note, that Since we started with a physical model, the strength of this still does not set us into a perfect knowledge about this regularization is specified by the ratio of the diffusion the field, as the data vector d ϕ n is still corrupted by = + constant to the excitation rate κ/ν, a physical quantity. noise. This should be seen in contrast to the frequently used To infer the field configuration at t 0 via Bayes’ the- 2 = ad-hoc parameter put in front of an L -gradient regular- orem, Eq. 7, we need the prior. As the dynamics is lin- ization term ϕ 2 R dx ϕ 2. Physics herself provides ||∇ || ≡ |∇ | ear and the excitation is Gaussian, the field statistics will the necessary regularization for us to treat the otherwise be Gaussian, P (ϕ) G (ϕ,Φ). This requires that we work = ill-posed problem. No ad-hoceries are needed. out the field covariance for a specific time. Fourier back- transforming Eq. 25 with respect to time gives

πν 2 2.3 Wiener filter Φ(k,t)(k ,t ) δ(k k0) exp( t t 0 κk ). (27) 0 0 = κk2 − −| − | Temperature fluctuations decay in time and they do this Now, we can work out the information Hamiltonian faster as smaller their spatial scale. Here, we are only interested in the spatial correlation structure, since we restrict 1 † 1 1 † 1 H (d,ϕ) ϕ Φ− ϕ (d R ϕ) N − (d R ϕ) our analysis to a single time slice at t 0 in order to keep = 2 + 2 − − = the discussion simple. 1 1 ln 2πΦ ln 2πN (32) The equal time correlation is +2 | | + 2 | | πν Φ(k,t)(k ,t) δ(k k0) 2πδ(k k0)Pϕ(k), (28) as well as the logarithmic partition function 0 = κk2 − = − with Z H (d,ϕ) J †ϕ lnZ (d, J) ln Dϕe− + ν = Pϕ(k) 2 (29) = 2κk 1 † 1 † 1 (j J) D (j J) d N − d the spatial power spectrum of the field. = 2 + + − 2 1 1 1 From now on we only consider the spatial domain, ln 2πD ln 2πΦ ln 2πN (33) Ω R. A field realization ϕ of a Gaussian process with this +2 | | − 2 | | − 2 | | = ³ ´ 1 covariance Φ is shown in Fig. 3. 1 † 1 − with D Φ− R N − R (34) = + † 1 and j R N − d. (35) 2.2 Prior regularization = It turns out that a separate moment generating variable J The covariance of our temperature fluctuation field is is not needed, as the variable j can take over this role. We actually equivalent to regularization of the solutions by just set J 0, take derivatives with respect to j , but do not an L 2-norm on the field gradient. The prior Hamiltonian, = set it to zero afterwards. The first and second connected H (ϕ) lnP (ϕ) lnG (ϕ,Φ) moments of the posterior field are then = − = − 1 † 1 1 ϕ Φ− ϕ ln 2πΦ ∂lnZ (d, j) = 2 + 2 | | ϕ (c) D j m (36) (ϕ d) † Z 〈 〉 | = ∂j = = 1 dk 2κ 2 2 k ϕk const = 2 2π ν | | + ∂2 lnZ (d, j) ϕϕ† (c) D. (37) κ Z 〈 〉(ϕ d) = ∂j †∂j = dx ϕ 2 const | = ν |∇ | + κ ϕ 2 const, (30) Higher order connected moments vanish. This means = ν ||∇ || + that the posterior is a Gaussian with this mean and covari- is regularizing the joint information Hamiltonian of data ance, and field,

H (d,ϕ) H (d ϕ) H (ϕ), (31) P (ϕ d) G (ϕ m,D), (38) = | + | = −

Wiener ﬁlter reconstruction Poisson log-normal reconstruction 1.0 ϕ 7 e3ϕ m e3ϕ

6 3ϕcl 0.5 d e 5 d

4 0.0

2 0.5 − 1

1.0 0 − 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure 3 Wiener filter reconstruction m (Eq. 47) of the un- Figure 4 Classical field reconstruction ϕcl (Eq. 64) of the unknown field ϕ from noisy and incomplete data d as discussed known field ϕ from Poisson counts d subject to a non-linear, in the text. The 1-σ posterior uncertainty is shown as well exponential response. The 1-σ posterior uncertainty is shown (Eqs. 53 and 49). Note the enlarged uncertainty in regions as well (Eqs. 53 and 65). Note the enlarged uncertainty in re- without data. The maximum likelihood field estimate is identical gions without data. The field realization ϕ is the same as in to the data points. Fig. 3. The ML field e3ϕML d and the posterior mean field = e3ϕ e3ϕHMC are shown as well. The HMC results were calcu- ≈ lated by the HMCF code [7] and kindly provided by its author, Christoph Lienhard. an insight also a direct calculation based on rewriting Eq. 32 in terms of these moments verifies: and therefore 1 † 1 1 ³ † † ´ H (d,ϕ) ϕ D− ϕ ϕ j j ϕ =b 2 + 2 + £ 1 2 2¤ 1 D 2πδ(k k0) 2κν− k σ− − , (42)   kk0 = − + n 1 1 2 † 1 † 1 † 1 jk σ− dk , and (43) ϕ D− ϕ ϕ D− D j j DD− ϕ = n = 2 + 2 |{z}+ Z m dk0 dk = mk Dkk jk with (44) 1 = 2π 0 0 = 1 λ2k2 ¡ ¢† 1 ¡ ¢ + b ϕ m D− ϕ m (39) = 2 − − 2 2κ 2 λ σn. (45) H (d,ϕ) = ν P (d,ϕ) e− G (ϕ m,D). (40) ⇒ ∝ ∝ − This means that the optimal field reconstruction m fol- In Eq. 39, we collected the quadratic and linear terms lows the data on large spatial scales 1/k λ (as mk dk in ϕ, introduced to express equality up to irrelevant À ≈ =b there) and should be a strongly suppressed version of the constants, used that D is invertible and self-adjoint, and (Fourier space) data for small spatial scales with 1/k λ performed a quadratic completion. The proportionalities 2 ¿ (as m d (λk)− d there). The optimal reconstruc- k ≈ k ¿ k in Eq. 40 combine into an equality, since the probabilities tion is the result of a Fourier space filter operation applied on the left and right hand sides of the equations are both to the data. This result was first found by Wiener [8] and properly normalized. Hence, the posterior is a Gaussian, the resulting filter is therefore called the Wiener filter. with mean m D j and uncertainty covariance D. = It should be noted that in ad-hoc regularization, the 2 For our specific problem we have R 1, N σn 1, and parameter λ, which controls the threshold wavelength = = of the low-pass filter, is chosen out of the blue, whereas

2πδ(k k0) here the knowledge about the statistics of the signal is Φ − (41) kk0 1 2 determined by physical quantities like λ p2κ/νσn. = 2κν− k =

8 Copyright line will be provided by the publisher September 5, 2018

back transformation. This way, by the usage of an oper- approximation error and uncertainty ator representation and the conjugate gradient method, σ 0.35 cl very complex measuremnt situations can be handled efﬁ- σ 0.30 ϕ ϕ ciently. One such situation is shown in Fig. 3. cl − 0.25

0.20 2.5 Information propagation

0.15 Before we treat more complicated measurement situa- 0.10 tions, let us try to develop some intuition what the (generalized) Wiener filter does. It turns linear data d R ϕ n 0.05 = + on a field into an optimal reconstruction m (see Fig. 3) 0.00 with 0.0 0.2 0.4 0.6 0.8 1.0 m D j, (47) = † 1 j R N − d, and (48) = Figure 5 Approximation errors of the MAP field estimate ϕcl ³ ´ 1 1 † 1 − D Φ− R N − R . (49) and its uncertainty σcl displayed in Fig.4 with respect to their = + more accurate MHCF estimates ϕ and σ, respectively. Black: First, we inspect j. This is the inversely noise weighted 1 7 † Absolute error of the MAP estimate revealing a small system- data N − d back projected by the adjoint response R atic overestimation of it. Red: Uncertainty via Laplace approx- into the field domain Ω. Every piece of data is thrown imation around MAP estimate. Pink: Uncertainty of the HMC onto all locations that have influenced it in the measure- estimate. Note the enlarged uncertainties at locations without ment process, with exactly the strength of this influence, data or fewer counts. as encoded in the adjoint (or transposed) response R†. Since j carries the essential information of the data and sources our posterior knowledge on the field, the name 2.4 Generatlized Wiener filter information source seems to be appropriate. Now, let’s see how D acts on j in m D j. In compo- = We made a number of assumptions in this derivation, nents, this is namely that the noise is white, the field has a specific Z covariance, and the instrument response is the identity. m(x) dx D(x, y) j(y). (50) None of these were essential. Assuming colored noise = Ω with covariance N, a general field covariance Φ, and an Thus, D(x, y) transports (and weights) the information arbitrary (but linear) response R would have provided us source j from all locations y to location x, where they with the solution given by Eqs. 34 - 38, Solely the nice are added up to provide the posterior mean field. For this property of this generalized Wiener filter being a Fourier- reason the term information propagator is natural for space-only operation would have been lost. D. This nomenclature is very much in accordance with The price for this generalized Wiener filter is that ap- conventions in quantum field theory and also serves IFT plying D to j to obtain the posterior mean m D j can well. = then often not be done analytically anymore. Instead, one For our simplistic example, the information propaga- needs to use a numerical scheme like the conjugate gradi- tor in position space ent method to solve 2 x y σn | − | 1 D(x, y) e− λ (51) D− m j (46) = 2λ = 1 1 † 1 for m. This is possible in case D− Φ− R N − R is = + available as an operator, a computer routine that takes and returns a vector in signal space. This requires only 7 The technical term back projection is to be understood in the 1 1 † that its constituents Φ− , N − , R, and R are available as way light is projected from a source (here the data living in computer routines as well. The implementation of the data space) onto a screen (here the signal space) and not translational invariant inverse prior field covariance oper- in the way it is used in the mathematical literature, where 1 ator Φ− is done via a fast Fourier transformation of the projection operators are allways endomorphic, staying within field, a division with the power spectrum, followed by a the same space.

is a translation-invariant convolution kernel, since nei- field estimate. Fig. 3 shows that this can be a very poor ther the prior nor measurement singles out any location. guess. The mean field m most strongly perceives the local infor- For the posterior, the mean, the median, and the mode mation source j d /σ2 at every location x, but also of the distribution coincide, as it is a Gaussian. Removing x = x n more distant contributions with exponentially decaying the prior information therefore has transformed our field weights: estimate into a maximum likelihood (ML) estimate. ML is a popular statistical practice that is known to over-fit Z x y dy | − | m(x) e− λ d(y). (52) the data. In our case, the ML estimation absorbed the full = 2λ noise into the solution, due to the lack of any redundancy This operation averages down the noise in an optimal in the data and the absence of the regularizing influence fashion. of the prior Hamiltonian. The information propagator D is informed about how the measurement was performed as it incorporates R and N. Furthermore, D knows about the correlations of field 2.7 Optimal estimate values at different locations through Φ. It uses the first to properly weight the information source and the latter to As the ML field estimate performed so poorly, the question extrapolate it to locations not or only poorly probed by arises, whether a better estimate than the Wiener filter the instrument. solution exists. To answer this question, a definition of Furthermore, since D is also the posterior uncertainty better or of worse is required. covariance in P (ϕ d) G (ϕ m,D), it also encodes how With the reconstruction error field ε(x) ψ(x) ϕ(x) | = − = − certain we can be about the reconstructed field via of the reconstruction ψ of a field ϕ a loss function l(ε) can be defined. Averaging the loss over the posterior provides 2 2 σϕ (ϕx mx ) (ϕx d) Dxx (53) the expected loss of a chosen ψ, which can be regarded as x ≡ 〈 − 〉 | = an objective function and how much the uncertainty is correlated between different locations (via Dx y ). Eq. 51 illustrates that in case of Z L(ψ d) l(ψ ϕ) (ϕ d) DϕP (ϕ d)l(ψ ϕ) (55) larger noise (as controlled by σn) this uncertainty is larger, | = 〈 − 〉 | = | − 2 2 p since σϕx Dxx σn/(2λ) σn/ 8κ/ν σn. It is cor- = = = ∝ for the field estimation. Minimizing L(ψ d) with respect related over larger distances, since λ p2κ/νσn σn. | = ∝ to ψ then provides an optimal field estimate according to Fig. 3 also shows the σϕx -uncertainty range around the general Wiener filter reconstruction m. the chosen error metric. Popular loss functions are the square error norm l(ε) = ε 2 R dx ε2(x), the absolute error norm l(ε) ε || || = = | |1 = R dx ε(x) , and the delta-loss l(ε) δ(ε). These lead to 2.6 Maximum likelihood | | = − the posterior mean, median, and mode as the optimal estimators, respectively. For the posterior mean being the It is instructive to see how much the field reconstruction 2 using the Wiener filter benefitts from the knowledge of optimal estimator under an L -error norm, the proof had the correct correlation structure. We can remove this in- already be given in Eq. 20. formation by dropping the prior term H (ϕ) in Eqs. 30-32 For the Gaussian posterior, we encountered so far, and the subsequent calculation. The most conventient mean, median, and mode coincide, as does any estimator 1 based on a monotonically increasing error loss. Thus, we way to do so is to take the limit Φ− 0, which yields for → the field estimate can state that the Wiener filter reconstruction m is the optimal estimator with respect to any sensible error metric. ³ ´ 1 1 † 1 − † 1 m D j Φ− R N − R R N − d No better estimate exists in this case. = = + ³ ´ 1 † 1 − † 1 R N − R R N − d → 1 2.8 Critical filter ³ † ¡ 2 ¢ 1 ´ † ¡ 2 ¢ 1 1 σ 1 − 1 − 1 σ 1 − d = n n So far, the prior signal covariance Φ was assumed to be d. (54) = known. If it is unknown, it has to be inferred as well. Here, Thus, by ignoring the prior knowledge on the field corre- we will sketch a simple empirical Bayes method for infer- lation one is forced in this case to take the data as the best ing both, the signal and its spectrum, the critical filter.

10 Copyright line will be provided by the publisher September 5, 2018

In empirical Bayes, the joint estimate of two interre- Thus, the data is used in a non-linear way to reach the lated quantities a and b is performed by using a point final critical filter field estimate and we have an exam- estimate of one quantity (e.g. b b ) in the estimation ple of an interacting field theory, where the field solution = MAP of the other. The logic is captured in the following calcula- depends in a non-linear way on the data. tion: The critical filter as presented above performes well, Z Z in case there are sufficiently many Fourier modes inform- a (a,b d) da db a P (a,b d) 〈 〉 | = | ing Pϕ(k). Otherwise, additional prior information on the Z Z spectrum should be used. For example, the usage of a db P (b d) da a P (a b,d) spectral smoothness prior [10] has turned out to be very = | | Z Z useful in cases where such an assumption is justified. 9 db δ[b bMAP(d)]P (a b,d) da a P (a b,d) The critical filter, its variations and extensions play ≈ − | | an important role for real world applications of IFT, since

a (a bMAP(d),d). (56) very often a sufficiently well understood theory that pre- = 〈 〉 | Thus, the uncertainty in P (b d), the a-marginalized pos- dicts the statistical properties of a field is not available. | terior for b, is ignored. In our specific case, a ϕ will be With the critical filter, a field and its power spectrum can = the signal field and b Φ its covariance. We recall that be estimated simultaneously from the same data. = P (ϕ Φ) G (ϕ,Φ). | = A hyperprior, a prior on a prior quantity, is necessary for this. In case statistical homogeneity can be assumed, 3 Interacting fields Φ is diagonal in Fourier space, Φ (2π)u δu(k q)P (k), kq = − ϕ with Pϕ(k) the spectral power density and k a Fourier The generalized Wiener filter of the free IFT appeared un- space vector, denoted here in bold face to distinguish it der a number of assumptions. These were Gaussianity and from its length k k . In case statistical isotropy can be mutual independence of signal and noise, linearity of the = | | assumed, the spectral power density is isotropic P (k) measurement and the precise knowledge of all involved ϕ = P (k), and all Fourier modes with the same length k k operators (Φ, R, and N). ϕ = | | have the same power. In case these conditions are not given, the situation It is then convenient to introduce the spectral band becomes more complicated. This is expressed by the infor- u projector (P ) (2π) δ(q q0)δ( q k). With this the mation Hamiltonian H (d,ϕ) lnP (d,ϕ) getting terms k qq0 ≡ − | | − = − maximum a posteriori estimator for the power spectrum8 of higher than second order in the field ϕ (after marginal- is ization of all other unknowns), so-called interaction terms. These lead – in general – to non-linear relations between £¡ † ¢ ¤ Tr m m D Pk the data and mean field as well as to non-Gaussian poste- Pϕ(k) + , (57) = Tr[Pk ] riors. The theory becomes non-free or interacting. Let us stay specific and turn our simplistic example where a flat prior on the spectral power density entries from above into a non-linear problem. To do so we modify was assumed [9]. A compact derivation of this result can Eq. 22 that it can cope with large temperature fluctuations be found on Wikipedia [4]. without becoming unphysical ( allowing negative tem- Eq. 57 and 36 have to be solved jointly, as m D j = = peratures). We achieve this by setting now depends on the power spectrum Pϕ(k) through D, and the power spectrum itself also depends on m. This so- µT (x)¶ ϕ(x) ln (58) called critical filter therefore iteratively estimates a field = T and the power spectrum of the process that has generated such that it from data. The field reconstruction is regularized by the covari- T (x) T exp(ϕ(x)) (59) = ance structure implied by the power spectrum, and the is now a strictly positive temperature, irrespective of the spectrum reconstruction is informed by the field estimate. value of ϕ(x), which we still assume to be drawn from

R 8 For this P (P d) DϕP (P , ϕ d) has to be maximized, 9 For informative spectral prior [9], spectral smoothness hyper- ϕ| = ϕ | the power spectrum posterior, which is marginalized over the priors [10], unknown noise [11], non-linear measurement ﬁeld. [10, 12] and others.

a zero centered Gaussian, ϕ - G (ϕ,Φ), with covariance A solution to this non-linear integral equation should ex- ← given by Eq. 28. hibit d R e3ϕcl as this satisfies the likelihood, but will ≈ Let us assume that we are able to observe the thermal also be smoothed due to the application of the smoothing photons emitted from our system, such that the expected action of the covariance kernel Φ. number of received photons by a detector in an observa- The classical field minimizes the Hamiltonian and, ac- tional period is cording to Eq. 9, maximizes the posterior. Thus, the classi- Z Z 3 cal field is the maximum a posteriori (MAP) field estimate. µ (ϕ) dx r 0(x)T (x) dx r (x) exp[3ϕ(x)] i = i = i In general the classical or MAP solution is not identical to the posterior mean. This requires special conditions, like r †e3ϕ, (60) = i that the posterior is point symmetric about ϕcl, as it is the where with i we label the detector and the observational case of the free theory discussed in Sect. 2. period, we have absorbed physical constants into the re- An estimate of the uncertainty covariance is given by 3 10 sponses r 0, have set r r 0 T , and introduced the conven- the inverse Hessian of the Hamiltonian at its minimum = ϕ ϕx Ã ! 1 tion to apply functions point-wise to fields, (e )x e . 2 − = 1 ∂ H (d, ϕcl) Observing photons is a Poisson process, and each data (D− )x y ≈ ∂ϕ ∂ϕ† bin has an independent distribution function cl cl µ d ¶ 1 3ϕcl † di Φ− 9δx y e x R 1 µ x y x 3ϕcl i µi = + − R e P (di µi ) e (61) | = di ! 3ϕ 3ϕ X Ri x Ri y di 9e clx + cl y . (65) for the number of photons di received by the individual + 3ϕcl 2 i (R e )i detectors’ observations as labeled by i . For the full dataset The uncertainty in this Laplace approximation (treating we write µ R exp(3ϕ), where we define the linear part of = Posterior as a Gaussian by expanding it around its max- the response operator as R r (x). i x = i imum) clearly depends on the classical field, and there- Thus, all elements to construct the information Hamil- fore on the data. For a different data realization, the un- tonian are available, and we get certainty would differ. Both, a field estimate and its un- X H (d,ϕ) H (ϕ) H (d µ (ϕ)) certainty for this measurment situation, can be found in = + i | i i Fig. 4. 1 † 1 † 3ϕ † ¡ 3ϕ¢ ϕ Φ− ϕ 1 R e d ln R e , (62) =b 2 + − where denotes that we dropped irrelevant (since ϕ- 3.2 Comparison to other estimators =b independent) constants and 1† is a row vector in data space with all entries being 1. This clearly is an informa- 3.2.1 Maximum likelihood tion Hamiltonian with strong self-interaction terms that are provided by the an-harmonic orders of the exponen- The MAP estimator provides the most probable field. A tial function and the logarithm. comparison to the ML field reveals how much this ben- To treat this, we will use two approximative ap- efited from the prior information. Again, we remove the 1 proaches from theoretical physics, by calculating the clas- prior in Eqs. 62 and 63 by the limit Φ− 0 to get from → sical and the mean field solution. Eq. 63

d R e3φML e3φML (66) = = 3.1 Classical ﬁeld approximation Thus, the ML estimate of the photons emissivity % ML = e3φML is again given by the data. Fig. 4 shows how poorly The classical ﬁeld ϕcl is simply given by the minimum of the ML estimate compares to he MAP estimate. the Hamiltonian: 3ϕ † ∂H (d,ϕ) 1 3ϕ † e R (3d) 0 Φ− ϕ e R 3 (63) = ∂ϕ = + − R e3ϕ 10 µ ¶ The middle term of the following exppression is small at the 1 3ϕ † d Φ− ϕ 3e R 1 minimum, as there R(ϕ) d. Since this term can produce = + − 3ϕ ⇒ ≈ R e negative eigenvalues of the Hessian, prohibiting its usage as µd R e3ϕcl ¶ a covariance, it is often (and also here) dropped in numerical 3ϕcl † ϕcl 3Φe R − (64) = R e3ϕcl applications of IFT.

12 Copyright line will be provided by the publisher September 5, 2018

3.2.2 Hamiltonian Monte Carlo for fields actual calculations as well. Thus, the HMC draws samples from the true posterior, and those are (largely) indepen- If the field estimate with the minimal square error is re- dent from each other. quested, the posterior mean should be calculated. This An HMC implementation for fields (HMCF)11 [7] was is not possible analytically anymore. Numerical meth- run on Eqs. 62 and the resulting posterior mean field is ods can help here. One powerful method is Hamiltonian displayed in Fig. 4 as well. It is apparent that here the MAP Monte Carlo (HMC) sampling [13,14], in which posterior field and the posterior mean agree very well, despite a samples are drawn from which the requested field esti- small systematic difference on the 0.1-σ level (see Fig. 5). mate ϕHMC can be calculated by simple averaging. One could try to improve the MAP solution by replac- For the HMC the information Hamiltonian is extended ing it with the so-called mean field approximation. In this to a full dynamical Hamiltonian by the introduction of an particular case, this is not necessary, given the accuracy of auxiliary momentum field p(x): the MAP solution here, however, in case of posterior distributions with long and asymmetric tails, MAP provides H (d,ϕ,p) H (p d,ϕ) H (d,ϕ) with (67) = | + inferior estimates compared to the mean field approxima- 1 † 1 1 tion introduced next. H (p d,ϕ) p M − p ln 2πM , (68) | = 2 + 2 | | where M is a positive definite mass matrix, which opti- mally is chosen close to the information propagator D. 3.3 Mean field approximation The HMC algorithm constructs a new sample of ϕ from an old one in three steps: The mean field approximation m0 should not be confused – Draw a new initial momentum from with the a posteriori mean field m ϕ (ϕ d), which it tries ≡ 〈 〉 | to approximate. Mean fields in statistical field theory are P (p d,ϕ) G (p,M). (69) | = constructed by minimization of the Gibbs free energy, – Follow the Hamiltonian dynamics from the current lo- G(m0,D0) U(m0,D0) TS(D0). (74) cation (ϕ,p) by integrating = − dϕ ∂H (d,ϕ,p) (70) In IFT the internal energy U is defined as dt = ∂p U(m ,D ) (d,ϕ) , (75) dp ∂H (d,ϕ,p) 0 0 H G (ϕ m0,D0) (71) = 〈 〉 − dt = − ∂ϕ the information Hamiltonian averaged over an approxi- sufficiently long to generate an uncorrelated new loca- mately Gaussian knowledge state P 0(ϕ0 m0, D0) G (ϕ tion (ϕ0,p0). | = − m ,D ) with mean m and variance D . The inference tem- – Accept this new sample with the Metropolis-Hasting 0 0 0 0 perature T 1 was kept in the formula to highlight the acceptance probability = formal similarity to thermodynamics. Finally, ³ ∆H ´ Z P(accept ϕ0,p0,ϕ,p) min 1,e− (72) ¡ ¢ | = S(D0) DϕG (ϕ m0,D0) ln G (ϕ m0,D0) = − − − with ∆ (d,ϕ ,p ) (d,ϕ,p). H H 0 0 H 1 ¯ ¯ = − ln¯2πe D0¯ (76) This results in samples drawn from = 2 P (ϕ,p d) P (ϕ d)G (p,M), (73) is the Boltzmann-Shannon entropy of this Gaussian state. | = | It measures the phase space volume of the uncertainty of the direct product of the field posterior and a Gaussian the approximative knowledge state. in momentum. The momenta can just be ignored as this distribution factorizes. The energy conserving property of the Hamilton dynamics integration ensures that only numerical errors lead to a small difference in the initial and final Hamiltonian and therefore the proposed samples (ϕ0,p0) are accepted with a high probability. The phase space volume conservation of the symplectic Hamilto- nian dynamics ensures that probability mass is treated correctly. An appropriate symplectic numerical integration scheme needs to be adopted for this holding in the 11https://gitlab.mpcdf.mpg.de/ift/HMCF

The Gibbs free energy can be rewritten as a rather complex expression, which depends on m0 and D0 Z µ ¶ G (ϕ m0,D0) itself. Thus the mean field and its uncertainty co-variance G(m0,D0) DϕG (ϕ m0,D0) ln − = − exp( H (d,ϕ)) have to be solved for jointly, as they are interdependent. − Z µ ¶ A comparison of the mean field and Laplace approxi- P 0(ϕ0 m0, D0) DϕP 0(ϕ0 m0, D0) ln | mations shows that the estimated uncertainty of the mean = | P (d,ϕ)) field is larger and this estimate is therefore more conserva- Z µ ¶ P 0(ϕ0 m0, D0) tive. The Laplace approximation usually underestimates DϕP 0(ϕ0 m0, D0) ln | = | P (ϕ d)P (d) uncertainties.This larger uncertainty also provides correc- | tions to the (mean) field (with respect to the Laplace/MAP KL£ ¤ ln (d), (77) P 0 P Z field). £= ¤ || − where KL P 0 P denotes the Kullback-Leibler (KL) diver- || gence of P 0(ϕ0 m0, D0) and P (ϕ d). Thus, the Gibbs free | | energy is (up to an irrelevant constant) the information 3.4 Operator formalism distance between the correct and approximate probability 12 distributions. The usage of the Gibbs free energy or KL divergence re- In case of our toy example, the Gibbs free energy reads quires the calculation of Gaussian averages of the Hamilto- 1 h ³ í 9 1 † † 3m0 Dc0 nian. Such Gaussian averages can elegantly be performed G(m0,D0) b tr Φ− m0 m0 D0 1 R e + 2 = 2 + + using the operator representation IFT. Similar to quan- † ³ 3m 9 Dc ´ 1 ¯ ¯ tum field theory, a field functional F (ϕ) is averaged over d ln R e 0+ 2 0 ln¯2πe D0¯, (78) − − 2 P (ϕ d) G (ϕ m,D) with d (m,D) by inserting the | = − = † † ϕ m Dc/2 since ϕϕ m0 m0 D0 and e e 0+ 0 . Here field operator [17], 〈 〉P 0 = + 〈 〉P 0 = Dc0x D0xx denotes the diagonal of D0. = Om m D ∂m (82) The mean field m0 is the minimum of the Gibbs free = + energy. A short calculation analogously to Eq. 64 yields into the functional, Ã 3m00 ! 3m † d R e m0 3Φe 00 R − , (79) = R e3m00 F (ϕ) (ϕ d) F (Om). (83) 〈 〉 | = 3 with m00 m0 Dc. The mean field m0 therefore depends This operator is set up to generate a field instance out of a = + 2 0 on the uncertainty D0, which itself has to be determined, Gaussian posterior knowledge state, either by minimizing Eq. 78, or by using equivalently the m D ∂ 1 † 1 m (ϕ m) D− (ϕ m) thermodynamical relation [16] Om G (ϕ m,D) + e− 2 − − − = p 2πD µ 2 ¶ 1 | | ∂ G(m0,D0) − ¡ 1 ¢ D . (80) m DD− (ϕ m) G (ϕ m,D) 0 † = ∂m0∂m0 = + − − ϕG (ϕ m,D). (84) The latter yields = − µ d ¶ 1 1 3mx00 † This permits to calculate Gaussian expectation values al- (D0− )x y Φ−x y 9δx y e Rx 1 = + − R e3m00 gebraically, since

Ri x Ri y di Z 3mx00 3m00y X 9e + , (81) F (ϕ) (ϕ d) DϕF [Om]G (ϕ m,D) + 3m00 2 | i (R e )i 〈 〉 = − F [Om] 1 (ϕ d) F [Om]1 F [Om]. (85) = 〈 〉 | = ≡ For the usage of this trick, Om m D ∂m c a has 12 = + = + The Gibbs free energy is equivalent to KL(P 0 P ), the to be split into a (mean) field creator c m and a field || = amount of information needed (as measured in nits) to change annihilator a D ∂ in the expression F [O ]. Since the = m m the correct posterior P to the approximate posterior P 0. Infor- field annihilator become zero if they act on the constant mation theoretically, the reverse would be more appropriate, functional 1 : m 1, as c 1 D ∂ 1 0, they disappear → = m = as KL(P P 0) would measure how much information has if moved to the very right in the expression of F [O ]1. || m to be added to the approximate posterior P 0 to restore the This can be achieved with the help of the commutator actual P [15]. This, however, would involve the calculation relations of non-Gaussian path integrals and is therefore often not an option. [a ,c ] D and [a ,a ] [c ,c ] 0. (86) x y = x y x y = x y =

14 Copyright line will be provided by the publisher September 5, 2018

The non-trivial commutation of annihilator and creator and therefore generate uncertainty corrections to averages over non- † 1 † 1 ε0ε0 D Φ− ϕ0ϕ0 Φ− D linear moments of the field. For example, 〈 〉(n0,ϕ0) = 〈 〉(ϕ0) † 1 † 1 DR N − n0n0 (n0)N − RD ϕx ϕy (ϕ d) Omx Omy 1 (cx ax )(cy ay )1 + 〈 〉 〈 〉 | = = + + 1 † 1 D Φ− D DR N − RD cx cy 1 cx ay 1 ax cy 1 ax ay 1 = + = + + + 1 DD− D D. (100) mx my 0 (cy ax [ax ,cy ])1 0 = = = + + + + It is important to note that also derivatives, e.g. with re- m m D . (87) = x y + x y spect the mean field, can be expressed via samples, Z ∂m H (d,ϕ) G (ϕ m,D) DϕH (d,ϕ)∂mG (ϕ m,D) 3.5 Kullbach-Leibler sampling 〈 〉 − ≈ − Z DϕH (d,ϕ)∂ϕG (ϕ m,D)) For numerical works, the expressions for the analytically = − − calculated Gibbs free energy or KL divergence might be- Z DϕG (ϕ m,D)∂ H (d,ϕ)∂ come too complex. As Eq. 79, they often contain diagonals = − ϕ ϕ and traces of operators, like D and Tr(D). These can often b ∂ϕH (d,ϕ) G (ϕ m,D) only be calculated via sampling techniques, since the op- = 〈 〉 − ∂ H (d,ϕ) . (101) erator D is only implicitly defined and only available as a = 〈 ϕ 〉{ϕ} computer routine. The sampling recipes are: Such KL-sampling techniques enable the inference of interdependent fields. For example the blind separation of Db ξD ξ {ξ} (no † ), (88) ≈ 〈 〉 several mixed signals, for which a simple MAP estimate Tr(D) ξ† D ξ , with (89) performs very poorly, could be achieved this way [18]. ≈ 〈 〉{ξ} ξ - G (ξ,1). (90) ← Instead of sampling such terms to calculate the expecta- 4 Applications tion value of a Hamiltonian, one can average the Hamilto- nian directly over appropriate samples: Current applications of IFT are mostly in astrophysics and cosmology. The elements of the theory presented here H (d,ϕ) G (ϕ m,D) H (d,ϕ) {ϕ}, with (91) 〈 〉 − ≈ 〈 〉 already cover many typical measurement situations in ϕ - G (ϕ m,D). (92) these areas: Gaussian and Poissonian noise, Gaussian and ← − log-Gaussian fields, linear instrument responses, known The drawing of samples from G (ϕ m,D) can be done via − and unknown field correlation structures, etc. A compari- a virtual signal and data generation in case D has the struc- 1 † 1 1 son of IFT based methods to existing algorithms can be ture of an information propagator, D (Φ− R N − R)− : = + found in many of the publication referred below. ϕ0 - G (ϕ0,Φ) (93) ← n0 - G (n0,N) (94) ← 4.1 Numerical information field theory d 0 R ϕ0 n0 (95) = + † 1 The implementation of algorithms derived within of IFT m0 DR N − d 0 (96) = is facilitated by the NIFTY library 13 for numerical in- ε0 ϕ0 m0 (97) formation field theory [19, 20]. For example, the calcula- = − tions for Figs. 3 and 4 were performed by NIFTY and the ϕ m ε0. (98) = + corresponding code14 became part of the current demo The virtual reconstruction error ε0 ϕ0 m0 obeys the package delivered with the library. = − right statistics, as

† 1 ε0 ϕ0 DR N − (R ϕ0 n0) = − + ³ 1 † 1 ´ † 1 13 D D− R N − R ϕ0 DR N − n0 https://gitlab.mpcdf.mpg.de/ift/NIFTy/ = − + 14https://gitlab.mpcdf.mpg.de/ift/NIFTy/blob/NIFTy_4/ ³ 1 † 1 ´ D Φ− ϕ0 R N − n0 (99) demos/poisson_demo.py = +

NIFTY permits the abstract implementation of IFT for- ori unknown and therefore simultaneously reconstructed) mulas irrespective of the underlying domains the fields correlation functions in the different directions. A diffuse live over. The same algorithm implemented with NIFTY log-emission field ϕ(x, y) over the product space of sky 2 can reconstruct without change fields living over 1D, 2D, (x S ) and log-energy (y ln(E/E0) R) therefore has ∈ = ∈(S2) (E) 3D Cartesian spaces, the sphere, or even product spaces the correlation structure Φ(x,y)(x ,y ) Φ Φ . 0 0 xx0 y y0 built out of those spaces. NIFTY takes care of many details = of the space pixelization, the operator calculus, the set up of correlation structures, harmonic transforms of fields, 4.3 Available IFT algorithms and numerical minimization schemes. This permits the user to concentrate on the essentials of her information A number of further IFT algorithms are already freely avail- field theoretical problem and less on implementation de- able as open source codes. Here an incomplete list: tails. Most numerical operations are performed by stan- RESOLVE images the radio sky from interferometric data dard libraries. [24–27]. Interferometers measure individual Fourier With NIFTY, different algorithmic choices for signal components of the brightness field and a reconstruc- estimation are possible. Algorithms can be based on MAP tion is indispensable to obtain an image. Despite the or Gibbs-free energy minimization, or use perturbation very different instrument response and noise statistics series, like expansion in Feynman diagrams. For the joint it deals with, RESOLVE is similar to D3PO and D4PO in inference of a signal and its power spectrum the mean that it describes diffuse emission as a log-normal field field approach of [12] is performing well. with unknown power spectrum. In the following we highlight a number of applications charm reconstructs non-parametrically the expansion of IFT to illustrate the spectrum of potential usages: pho- history of the Universe from supernovae 1a data [28]. It ton count imaging, other available imaging and signal re- has confirmed the concordance of this dataset with the construction methods, non-Gaussianity estimation, and ΛCDM model [29]. simulation of field dynamics. PySEA is an open-source project dedicated to provide a generic Python framework for spatially explicit statistical analyses of point clouds and other geospatial data, 4.2 Photon imaging in the spatial and frequency domains, for use in the geosciences.16 Photon imaging is the reconstruction of the continuous starblade separates point sources from diffuse emission spatial and/or spectral photon emission field from which while estimating and using the correlation structure of a finite number of detected photons were emitted. The the latter (Fig. 6) [30]. It assumes the data to be noiseless 3 4 D PO and D PO algorithms [21,22] address this task. The and is therefore intended for the post-processing of 15 3 first one ,D PO, denoises, deconvolves and decomposes high fidelity images or as an internal processing step in 3 photon observations (thus the acronym D PO) into a dif- other imaging codes. fuse log-emission field plus a point-source log-emission field. For the diffuse field a power spectrum is inferred as well. For the point source field point sources are assumed 4.4 Non-Gaussianity to reside in every image pixel, just most of them being 3 insignificantly faint. The application of D PO to data of An important question of contemporary cosmology is how the Fermi gamma-ray satellite is shown in Fig. 2 [23]. precisely the initial density fluctuations of the Universe 4 The successor algorithm D PO does essentially the were following Gaussian statistics. The different inflation- same, just with the difference that the fields can be de- ary scenarios for the first fractions of a second of the cos- fined on product spaces of spatial coordinates (the sky) mic history predict different amounts of deviations from and other domains (like the photon energy space). Fur- Gaussianity. A simple parametrisation of non-Gaussianity thermore, an arbitrary number of such fields can be re- is the so-called fnl model, in which an initially Gaussian constructed simultaneously, e.g. to allow also for back- random field ϕ G (ϕ,Φ) is non-linearly processed into ← ground counts in data space. The correlation structure of the gravitational potential in the Early Universe observed the fields is assumed to be the direct product over (a pri-

15http://ascl.net/1504.018 16Quote from https://github.com/dbuscombe-usgs/pysesa.

16 Copyright line will be provided by the publisher September 5, 2018

Figure 6 Hubble Space Telescope image of the Andromeda galaxy (top left for original image) separated by starblade into stars (top right) and diffuse emission (bottom). Here, an early version of starblade was applied separately to the RGB channels of a JPEG image [31] for illustration purpose only. Figure kindly provided by Jakob Knollmüller, the author of starblade.

at a later epoch: to second order, one can construct the MAP estimator for fnl [1] . This is displayed in Fig. 1 superimposed to a ψ(x) ψ[ϕ, f ](x) ϕ(x) f £ϕ2(x) ϕ2(x) ¤ (102) = nl = + nl − 〈 〉(ϕ) map of the CMB, to which such a non-Gaussianity esti- If the measurement process is linear and Gaussian, with mator can be applied. The Feynman diagrams containing d R ψ n and n G (n,N), the information Hamilto- fnl up to linear order are in the numerator and the ones = + ← nian up to quadratic order in the denominator of the fraction that comprises the fnl-estimator (while fnl was removed 1 ¡ ¢† 1 ¡ ¢ H (d,ϕ f ) d R ψ[ϕ, f ] N − d R ψ[ϕ, f ] | nl =b 2 − nl − nl from these terms as it is to be estimated). The numerator is identical to the well known Komatsu-Spergel-Wandelt 1 ϕ†Φϕ (103) (KSW) estimator [32] used in CMB studies, the denomi- +2 nator provides a Bayesian normalization, which depends becomes fourth order in ϕ. on the particular data realization. Actually, the original In order to decide which value f the data prefer, one nl KSW estimator comprised only the terms given by the needs to calculate the evidence P (d f ). This is, how- | nl ﬁrst two diagrams of the numerator. The other terms were ever, also the partition function Z (d f ). Since f is a | nl nl discovered later as correction terms for inhomogeneous (comparably) small parameter, one can use that the loga- noise [33]. The diagrammatic approach to IFT delivers all rithm of the partition function is given by the sum of all these terms naturally and simultaneously. Further details simply connected Feynman diagrams without external on this can be found in [1]. vertices. Sorting such diagrams by their order in fnl up

4.5 Information field dynamics state space state space As a final application, we show how IFT may be used in future to build computer simulation schemes for partial time differential equations (PDEs). A field ϕ(x,t) over space evolution and time may follow a PDE of the form operator

∂ ϕ F [ϕ]. (104) t = 2 1/2 signal entropic For example F [ϕ] κ∂x ϕ ν ξ reproduces Eq. 23. inference = + matching On a computer, only a discretized version of the field can be stored in memory. In the framework of information field dynamics (IFD, [34–36]), the data vector in computer memory describing the field at an instant t is regarded as the result of a virtual measurement process of the form given by Eq. 3. A convenient linear re- data in computer memory data in computer memory sponse R might be given by the pixel window function, resulting simulation scheme e.g. R P(x Ω x,Ω ) {0,1} such that R 1 if x is i x = ∈ i | i ∈ i x = within the volume of the i-th pixel Ω , otherwise R 0. i i x = In this case the data would contain the pixel-integrated field values. The measurement noise might be absent, Figure 7 Sketch of the construction of simulation schemes via n 0, as the virtual measurements can be chosen to be IFD. The data in computer memory (orange bars) at some time = noiseless. imply a posterior in field configuration space (orange contours). However, even without measurement uncertainty, the Each point of this PDF (e.g. orange sample curves shown with field is not fully determined by the data. The remaining un- data) is a continuous field that can be evolved mathematically certainty is captured by the field posterior P (ϕ d), which to a future time according to the PDE to be simulated. The | can be specified via Bayes’ theorem (Eq. 9) in case a field evolved state has to be represented by data again. This is prior P (ϕ) is available. chosen via entropic matching, such that the new data contains In essence, IFD evolves the knowledge on the field as as much information as possible about the evolved field. The parametrized by the posterior by evolving the data d in combination of these operations defines a data space-only such a way that information on the field is conserved as operation, the simulation scheme, which can be performed by much as possible. For this, the KL-divergece between the a computer. It incorporates the data, prior knowledge on the time evolved field posteriors and a posterior from later field, and the dynamical laws in an information optimal way. time data d 0 is minimized. Thus, the idea of IFD is to find an evolution equation for data d in computer memory, which captures and incorporates the fully resolved field evolution and the prior The time derivative of the data is given by the measure- knowledge as well as possible. As this data vector is finite ment response R applied to the time derivative of the field, dimensional, its evolution equation is then only an ordi- averaged over all posterior field configurations. This gives nary differntial equations (ODE), which can be solved with now an ordinary differential equation for the data, which standard numerical methods. The data, however, implies can be solved with appropriate numerical methods. The a PDF over the full infinite dimensional field configura- evolution of the data implies an evolving field probability tion space, which can be questioned for any quantity of density P (ϕ d(t)), which encodes the knowledge about | interest. the field at all instances, including its mean value and its Let us regard the case in which the field prior is uncertainty. Gaussian, P (ϕ) G (ϕ,Φ), the measurement is linear = Since the measurement equation was chosen to be and noise-free, d R ϕ, and thus the posterior is Gaus- = linear, and the field prior to be Gaussian, the posterior sian as well P (ϕ d) G (ϕ m,D), with m m(d) † † 1 | = − = = will be Gaussian as well. Nevertheless, the F (ϕ) term is ΦR (R ΦR )− d W d is the noise-less Wiener filter re- ≡ non-linear for a non-linear PDE and averaging it might be construction. In this case, the optimal temporal evolution non-trivial analytically. of the data is given by [35] The field operator trick from Sect. 3.4 helps here. Since ∂t d R F (ϕ) (ϕ d). (105) the average is over a Gaussian knowledge state, this ex- = 〈 〉 |

18 Copyright line will be provided by the publisher September 5, 2018

pression can be brought into the compact form prior knowledge should evolve according to the PDE to be simulated is currently an open question. ∂ d RF [O ], (106) t = m Anyhow, IFD opens the door for advanced simulations. For example, as IFD is already formulated probabilisti- where cally, the assimilation of observational data into an IFD simulation may be relatively straight-forward. O m D ∂ (107) m = + m is the field operator. Eq. 106 describes the data evolution that captures the 5 Outlook (knowledge on the) field evolution best. For a linear PDE, like Eq. 23, the ∂m terms make no difference, and the data IFT, the information theory for fields, has many poten- equation becomes tial scientific, technological, and economical applications. Two current development lines to bring IFT into broader † † 1 ∂t d RF [m] RF ΦR (R ΦR )− d Fe d, (108) usage are presented as a closing outlook: = = | {z } = Fe Imaging, the transformation of astronomical or medi- ≡ cal data into visual informative images, is a central ap- a linear ODE. The data evolution operator Fe contains a plication area of IFT. The recipe to an IFT imaging al- weighting of the action of the PDE operator F with the gorithm is the construction of the information Hamilto- statistics of the expected field fluctuations. These are en- nian and/or its Gibbs free energy, and the minimization coded in Φ and its projection into the data space via the of those with respect to the unknown fields. The joint † measurement response R. This weight R ΦR is also the Hamiltonian is comprised of an instrument description ‘denominator’ that ensures the proper units of the data H (d ϕ,θ), a field prior H (ϕ θ), and hyper-prior H (θ) of | | space operator. This way, the knowledge about the field all the remaining unknowns θ (field or noise power spec- modes, which are not resolved by the data, enters the data tra, calibration parameters of the response, etc.). Thus, dynamics. For example, an IFD algorithm for a thermally imaging of very different instruments can be brought into excited Klein-Gordon field performs slightly better than a unified description, in which even the data of differ- a spectral scheme for the same problem, because the for- ent instruments can be imaged jointly. To this end, an mer exploits its knowledge about the correct statistics of Universal Bayesian Imaging toolKit (UBIK)17 is under de- the unresolved field fluctuations [34]. velopment, which will permit imaging based on multi- For a non-linear PDE, quadratic terms in F gener- instrument data of fields living over multi-dimensional ate quadratic terms in the field operator, O2 (m m = + spaces with spatial, spectral, and/or temporal dimen- D ∂ )(m D ∂ ) m2 D O(∂ ) that contain non- m + m = + + m sions. vanishing contributions from the uncertainty dispersion Besides the already described IFD development, the D. Thus, the data of a non-linearly evolving field not only inference of dynamical fields including their unknown get a non-linear ODE, as m2 (W d)2 is quadratic in the = dynamics from data is a relevant research direction with data, they also get corrections that capture the effect of promising first results [37]. Observations of an evolving the uncertainty processed through the non-linearity, ex- field might be sufficient to determine the dynamical laws pressed by terms that contain the field uncertainty D. the field obeys, which then can be used to better estimate A naive discretization of a PDE, which just replaces and predict the field from limited observational data. the differential operators of the PDE with difference op- IFT not always provides novel methods. In many cases erators on the data, does not account for such non-linear it just reproduces well-established algorithms. The Wiener sub-grid effects. For example, IFD schemes for the Burg- filter is such an example, which emerges in IFT as the ers equation, which is a simplistic version of compressive simplest possible inference algorithm in case of linear hydrodynamics, handle the shock waves developing in measurement and independent Gaussian signal and noise the field dynamics better than (central) finite difference statistics. Thus, IFT can shed light on the conditions under schemes [35], the latter representing the (most) naive dis- which an existing method becomes optimal. Furthermore, cretization of the PDE. Despite these encouraging results, it should be noted that IFD requires further theoretical investigation and algorithmic work before it can seriously compete with 17The name derives from the novel UBIK by Phillip K. Dick existing state-of-the art simulation schemes in terms of (1969), in which an agent called UBIK is essential to restore usability and performance. For example, how the field reality whenever this gets corrupted.

IFT permits to extend such methods consistently to more References complex, non-linear, and non-Gaussian situations. These may perform superior to traditional approaches, in case [1] T.A. Enßlin, M. Frommert, and F.S. Kitaura Phys. the assumptions used in the IFT algorithm derivation are Rev. D 80(10), 105005 (2009). sufficiently met by the actual measurement situation at [2] T. Enßlin, Information field theory, in: American Institute of Physics Conference Series, edited by hand. U. von Toussaint, , American Institute of Physics Conference Series Vol. 1553 (August 2013), pp. 184– To summarize, IFT seems to be applicable within all 191. areas in which fields have to be handled under uncer- [3] T. Enßlin Bayesian Inference and Maximum Entropy Methods in Science and Engineering tainty. IFT is the appropriate language for field inference 1636(December), 49–54 (2014). from imperfect data. Currently it is mostly used in as- [4] W. contributors, Information field theory — trophysics and cosmology, however, its potential for geo- wikipedia, the free encyclopedia, 2018, [Online; physics, medical imaging, and other areas of remote and accessed 3-February-2018]. non-invasive sensing should be obvious. [5] J. C. Lemm, Bayesian Field Theory (Johns Hopkins University Press, 2003). [6] E. T. Jaynes, On the Rationale of Maximum Entropy This short introduction to IFT aimed to show that Methods, in: Proc. IEEE, Volume 70, p. 939-952, physicists have the very powerful language of field the- (1982), pp. 939–952. ory at their disposal to address field inference problems. [7] Whether the resulting method performs better or worse [8] N. Wiener, Extrapolation, Interpolation, and than traditional approaches depends on whether correct Smoothing of Stationary Time Series (NY: Wiley, 1949). assumptions were made or not. In any case, an IFT-based [9] T.A. Enßlin and M. Frommert Phys. Rev. D 83(10), investigation of an inference problem will provide deeper 105014 (2011). insights into the nature of the available information on [10] N. Oppermann, M. Selig, M. R. Bell, and T.A. Enßlin the field of interest. Phys. Rev. E 87(3), 032136 (2013). [11] N. Oppermann, G. Robbers, and T.A. Enßlin Phys. Rev. E 84(4), 041118 (2011). [12] J. Knollmüller, T. Steininger, and T.A. Enßlin ArXiv e-prints(November) (2017). Acknowledgments [13] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth Physics Letters B 195(September), 216– 222 (1987). [14] M. Betancourt ArXiv e-prints(January) (2017). The work described here would not have been possible [15] R. Leike and T. Enßlin Entropy 19(August), 402 without the support, inputs, ideas, criticism, discussion, (2017). and work of many collaborators, students, and friends. In [16] T.A. Enßlin and C. Weig Phys. Rev. E 82(5), 051112 (2010). particular among them I like to thank Michael Bell, Martin [17] R. H. Leike and T.A. Enßlin Phys. Rev. E 94(5), Dupont, Philipp Frank, Mona Frommert, Maxim Greiner, 053306 (2016). Sebastian Hutschenreuter, Henrik Junklewitz, Francisco- [18] J. Knollmüller and T.A. Enßlin Phys. Rev. E 96(4), Shu Kitaura, Jakob Knollmüller, Reimar Leike, Christoph 042114 (2017). Lienhard, Niels Oppermann, Natalia Porqueres, Daniel [19] M. Selig, M. R. Bell, H. Junklewitz, N. Oppermann, Pumpe, Martin Reinecke, Georg Robbers, Marco Selig, M. Reinecke, M. Greiner, C. Pachajoa, and T.A. Theo Steininger, Valentina Vacca, and Cornelius Weig for Enßlin Astronomy & Astrophysics 554(June), A26 their valuable contributions to the field, which I had the (2013). [20] T. Steininger, J. Dixit, P.Frank, M. Greiner, honor to proudly present here. I want to appologize to all S. Hutschenreuter, J. Knollmüller, R. Leike, N. Por- the others, not mentioned here, for me not including their queres, D. Pumpe, M. Reinecke, M. Šraml, C. Varady, work into this article. The presentation of this work bene- and T. Enßlin ArXiv e-prints(August) (2017). fited from detailed feedback by Jakob Knollmüller, Reimar [21] M. Selig and T.A. Enßlin Astronomy & Astrophysics Leike, Ancla Müller, Martin Reinecke, Kandaswamy Sub- 574(February), A74 (2015). ramanian, and two anonymous referees. [22] D. Pumpe, M. Reinecke, and T.A. Enßlin ArXiv e- prints(February) (2018). [23] M. Selig, V. Vacca, N. Oppermann, and T.A. Enßlin Key words. Information theory, field theory, probability theory, Astronomy & Astrophysics 581(September), A126 Bayes’ theorem, inverse problems, imaging.

20 Copyright line will be provided by the publisher September 5, 2018

(2015). [24] H. Junklewitz, M. R. Bell, and T. Enßlin Astronomy & Astrophysics 581(September), A59 (2015). [25] H. Junklewitz, M. R. Bell, M. Selig, and T.A. Enßlin Astronomy & Astrophysics 586(February), A76 (2016). [26] M. Greiner, V. Vacca, H. Junklewitz, and T.A. Enßlin ArXiv e-prints(May) (2016). [27] P.Arras, J. Knollmüller, H. Junklewitz, and T.A. Enßlin ArXiv e-prints(March) (2018). [28] N. Porqueres and T.A. Ensslin, Charm: Cosmic history agnostic reconstruction method, Astrophysics Source Code Library, March 2017. [29] N. Porqueres, T.A. Enßlin, M. Greiner, V. Böhm, S. Dorn, P.Ruiz-Lapuente, and A. Manrique Astron- omy & Astrophysics 599(March), A92 (2017). [30] J. Knollmüller, P.Frank, and T.A. Enßlin ArXiv e- prints(April) (2018). [31] Wikipedia contributors, Andromeda galaxy — Wikipedia, the free encyclopedia, https://en. wikipedia.org/w/index.php?title=Andromeda_ Galaxy&oldid=855593236, 2018, [Online; accessed 28-August-2018]. [32] E. Komatsu, D. N. Spergel, and B. D. Wandelt ApJ 634(November), 14–19 (2005). [33] P.Creminelli, A. Nicolis, L. Senatore, M. Tegmark, and M. Zaldarriaga JCAP 5(May), 004 (2006). [34] T.A. Enßlin Phys. Rev. E 87(1), 013308 (2013). [35] R. H. Leike and T.A. Enßlin ArXiv e-prints(September) (2017). [36] M. Dupont and T. Enßlin ArXiv e-prints(February) (2018). [37] P.Frank, T. Steininger, and T.A. Enßlin Phys. Rev. E 96(5), 052104 (2017).