arXiv:0806.3474v3 [astro-ph] 29 Sep 2009 nepolm a ehia datgs ic tpermits it since advantages, technical has problems ence as similarly very and IFT. SFT), in & uncertainty (QFT theory sta- and field quantum tistical in as fluctuations statistical itself moments and manifests quantum calculate Non-classicality to uncertainties. their optimal and of construct to theory recovering order perturbation signal in diagrammatic flows use Thus, to weand fields. how here, non-classical show and appa- later, we interacting theoretical detailed treat field to existing be ratus the will of a advantage on as full problems take level, such field we treat mostly classical which which theory, works, field previous statistical others a as name to show, distributed for We leads theory quantities information that view. before, of done have point theoretical mation recipes of number large a to methodologies. conceptual and led such even have design and which to technical difficulties, how many re- of possesses and problem methods physics data-driven The quantitative, in any inference. signals for scientific essential other is many fields, on lated also but ogy, background rmdt nsailydsrbtdqatte ietecos- the like quantities mic distributed spatially on data from h nomto hoeia esetv nsga infer- signal on perspective theoretical information The infor- strictly a from problems such address we Here, h pia xrcinadrsoaino information of restoration and extraction optimal The ag-cl structure large-scale nomto edtheory field information CB eprtr utain ncosmol- in fluctuations temperature (CMB) pt ierodri h o-iert aaee n a eu be can and parameter non-linearity data. the the in in Thi order non-linearities imperfections. linear measurement s to to from up fi due predicted a expected are and design which We narios, background, 2) microwave cosmic equation. the flow c response-renormalization provide, observations a Poissonian-noi and of formation and galaxy incomplete and densit non-linear, structure initial strongly the a resemble should with m which simple signal, a Gaussian within surveys a s galaxy large-scale incomplete cosmic in recover counts the galaxy signal of Reconstruction cosmological 1) two position measure. here, formulation. in information However, rules Boltzmann-Shannon fields. Feynman the many the and provide space, Int we ics theory. which fie Wiener-filter for source known well the expanded, the Hamiltonian, noi reproduces information signals, IFT Free the measurements, derive of we appr nature ality, didactical the A on fields. considerations information the signals, tributed edevelop We a-lnkIsiu ¨rAtohsk Karl-Schwarzschi f¨ur Astrophysik, Max-Planck-Institut .INTRODUCTION I. .Motivation A. LS rthe or (LSS) ose .Eßi,Mn rmet n rnic .Kitaura S. Francisco and Frommert, Mona Enßlin, A. Torsten nomto edtheory field information o omlgclprubto reconstruction perturbation cosmological for IT.I otatt the to contrast In (IFT). omcmicrowave cosmic n o-iersga analysis signal non-linear and nomto edtheory field Information Dtd coe 9 2018) 29, October (Dated: IT samaso aeinifrneo ptal dis- spatially on inference Bayesian of means a as (IFT) iei h olwn hr vriwo h structure the on overview article. the short of pro- a content we following and therefore the and interest, in everyones antici- vide of this be arti- to might this Due in cle everything not concepts. readership, some non-uniform mathematical pated introduce basic we famil- its formalisms, very of theoretical not field are with readers we iar Since interested many reception. that and a exist- extraction expect the as signal of of serve many methods may classify ing IFT and theoret- understand whom or to for framework philosophical scientists, more inclined are ically second The cos- spatially in a mology. exclusively for not who but problem inverse especially since quantity, scientists, concrete IFT distributed of a applied facing aspect are are practical the they first in The interested mainly are readers. of types could alone algorithms ad-hoc provide. of pure evaluations than on assumptions empirical dependence and knowledge its prior accumulation, and models, data knowledge flows, information of in- underlying mechanisms deeper its provides the also into it sight However, ex- setups. and perimental algorithms optimized information-yield design to h anpr fti ril al notocategories: two into CMB. falls the article and this LSS of the cosmic part into the introduction main brief on The very works infer- a relevant signal as here well on as work theory previous ence the of discussion detailed eteeoehp htorwr so neetfrtwo for interest of is work our that hope therefore We h eane fti nrdcinscincnan a contains section introduction this of remainder The rcuemte itiuinfo discrete from distribution matter tructure ahi tepe.Satn rmgeneral from Starting attempted. is oach etrain fteUies,observed Universe, the of perturbations y dSr ,871Grhn,Germany Garching, 85741 1, ld-Str. eaetdrsos,a h rcse of processes the as response, affected se nb eosrce hnst h virtue the to thanks reconstructed be an e n hi eaint hsclre- physical a to relation their and se, rbesaedsusdi hi IFT- their in discussed are problems y dlo aayfrain eso that show We formation. galaxy of odel d rpgtr n neato terms. interaction and propagator, ld, le steotmlBys estimator Bayes’ optimal the is filter s rcigITcnb diagrammatically be can IFT eracting m al-nvreifltoaysce- inflationary Early-Universe ome trt eetlclnnlnaiisin non-linearities local detect to lter h hoysol eapial in applicable be should theory The e vnt osrc k asof maps sky construct to even sed ,Fuir,adshrclharmon- spherical and Fourier-, -, .Oeve ftework the of Overview B. 2 abstract IFT and its application. The concepts of IFT 1. and are introduced in Sec. II, where Bayesian methodology, the distinction of physical and information fields, the def- The fundament of information theory was laid by the inition of signal response and noise, as well the design of work of Bayes [1] on probability theory, in which the cele- signal spaces are discussed. The basic IFT formalism in- brated Bayes theorem was presented. The theorem itself cluding the free theory is introduced in Sec. III, which, (see Eq. 7) is a simple rule for conditional probabilities. according to our judgement, summarizes and unifies the It only unfolds its power for inference problems if used previous knowledge on IFT before this paper. An im- with belief or knowledge states, described by conditional patient reader, only interested in applying IFT and not probabilities. worrying about concepts, may start reading in Sec. III. The advent of modern information theory is probably From Sec. IV on the new results of this work are pre- best dated by the work of Shannon [2, 3] on the concept sented, starting with the discussion of interacting infor- of information measure, being the negative Boltzmann- mation fields, their Hamiltonians and Feynman rules, and entropy, and the work of Jaynes, combining the language the Boltzmann-Shannon information measure. The nor- of statistical mechanics and Bayes probability theory and malisability of sensibly constructed IFTs is shown, as well applying it to knowledge uncertainties [4, 5, 6, 7, 8, 9, 10]. the classical information field equation is presented there. The required numerical evaluation of Bayesian probabil- A step-by-step recipe of how to derive and implement an ity integrals suffered often from the curse of high dimen- IFT algorithms is also provided. sionality. The standard recipe against this, still in mas- Details of the notation can be found, if not defined in sive use today, is importance sampling via Markov-Chain the main text, in Appendix A. Monte-Carlo Methods (MCMC), following the ideas of Applications of the theory are provided in the following Metropolis et al. [11], Hastings [12], and Geman and Ge- two sections, which can be skipped by a reader interested man [13], where the latter authors already had image only in the general theoretical framework. Although spe- reconstruction applications in mind. The Hamiltonian cific inference problems are addressed, they should serve MCMC methods [14], in which the phase-space sampling as a blueprint for the tackling of similar problems. In is partly following Hamiltonian dynamics, are also of rel- Sec. V the problem of the reconstruction of the cosmic evance here. There the Hamiltonian is introduced as the matter distribution from galaxy surveys is analyzed in negative logarithm of the probability, as we do in this terms of a Poissonain data model. In Sec. VI we derive work. an optimal estimator for non-Gaussianity in the CMB, With such tools, higher dimensional problems, as and show how it can be generalized to map potential present in signal restoration, could and can be tackled, non-Gaussianities in the CMB sky. Our summary and however, for the price of getting stochastic uncertainty outlook can be found in Sec. VII. into the computational results. For a recent review on image restoration MCMC techniques, see [15]. The applications and extensions of these pioneering works are too numerous to be listed here. Good mono- C. Previous works graphs exist and the necessary references can be found there [16, 17, 18, 19, 20, 21]. The work presented here tries to unify information the- ory and statistical field theory in order to provide a con- 2. Image reconstruction in astronomy and elsewhere ceptual framework in which optimal tools for cosmologi- cal signal analysis can be derived, as well as for inference problems in other disciplines. Below, we provide very The problem of image reconstruction from incomplete, brief introductions into each of the required fields1 (in- noisy data is especially important in astronomy, where formation theory, image reconstruction, statistical field the experimental conditions are largely set by the nature theory, cosmological large-scale structure, and cosmic mi- of distant objects, weather conditions, etc., all mainly crowave background), for the orientation of non-expert out of the control of the observer, as well as in other readers. An expert in any of these fields might decide to disciplines like medicine and geology, with similar limita- skip the corresponding sections. tions to arrange the object of observations for an optimal measurement. Some of the most prominent methods of image reconstruction, which are based on a Bayesian im- plementation of an assumed data model, are the Wiener- 1 This work has tremendously benefitted in a direct and indirect filter [22], the Richardson-Lucy [23, 24], and way from a large number of previous publications in those fields. the maximum-entropy image restoration [25](see also We, the authors, have to apologize for being unable to give full [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]). credit to all relevant former works in those fields for only con- The Wiener filter can be regarded to be a full Bayesian centrating on a brief summary of the papers more or less directly influencing this work. This collection is obviously highly biased image inference method in case of Gaussian signal and towards the cosmological literature due to our main scientific noise statistics, as we will show in Sect. III B. It will interests and expertise, and definitely incomplete. be the working horse of the IFT formalism, since the 3

Wiener filter represents the algorithm to construct the Finally, the work of Lemm and coworkers [48, 49, 50, exact field theoretical expectation value given the data 51, 52, 53, 54, 55] established a tight connection between for an interaction-free information Hamiltonian. The fil- statistical field theory and Bayesian inference, and pro- ter can be decomposed into two essential information posed the term Bayesian field theory (BFT) for this. processing steps, first building the information source by However, we prefer the term information field theory response-over-noise weighting the data, and then propa- since it puts the emphasis on the relevant object, the gating this information through the signal space, by ap- information, whereas BFT refers to a method, Bayesian plying the so called Wiener variance. inference. The term information field is rather self- The Richardson-Lucy algorithm is a maximum- explaining, whereas the meaning of a Bayesian field is likelihood method to reconstruct from Poissonian data not that obvious. and therefore is also of Bayesian origin. This method The applications considered by Lemm concentrate on has usually to be regularized by hand, by truncation of the reconstruction of probability fields over parameter the iterative calculations, against an over-fitting insta- spaces and quantum mechanical potentials by means of bility due to the missing (or implicitly flat) signal prior. the maximum a posteriori equation. The extensive book A Gaussian-prior based regularization was recently pro- summarizing the essential insights of these papers, [48], posed by Kitaura and Enßlin [38], and the implementa- clearly states the possibility of perturbative expansions of tion of a variant of this is presented here in Sect. V D. the field theory. However, this is not followed up by these Maximum entropy algorithms will not be the topic authors probably for reasons of the computational com- here, as well as not a number of other existing methods, plexity of the required algorithms. In contrast to many which are partly within and partly outside the Bayesian of the previous works on IFT, which deal with ad-hoc framework. They may be found in existing reviews on priors, the publication by Lemm [56] is remarkable, since this topic [e.g. 39, 40]. it provides explicit recipes of how to implement a priori information in various circumstances more rigorously. The mathematical tools required to tackle IFT prob- 3. Statistical and Bayesian field theory lems come from SFT and QFT, which have a vast litera- ture. We have specially made use of the books of Binney The relation of signal reconstruction problems and field et al. [57], Peskin and Schroeder [58], and Zee [59]. theory was discovered independently by several authors. In cosmology, a prominent work in this directions is Bertschinger [41], in which the path integral approach 4. Cosmological large-scale structure was proposed to sample primordial density perturbations with a Gaussian statistics under the constraint of exist- Our first IFT example in Sec. V is geared towards ing information on the large scale structure. The work improving galaxy-survey based cosmography, the recon- presented here can be regarded as a non-linear, non- struction of the large-scale structure matter distribution. Gaussian extension of this. Many methods from statistics We provide here a short overview on the relevant back- and from statistical mechanics were of course used even ground and works. earlier, e.g. the usage of moment generating function for The LSS of the matter distribution of the Universe cosmic density fields can already be found in Fry [42]. is traced by the spatial distribution of Galaxies, and Simultaneously to Bertschinger’s work, Bialek and Zee therefore well observable. This structure is believed to [43, 44] argued that visual perception can be modeled as have emerged from tiny, mostly Gaussian initial den- a field theory for the true image, being distorted by noise sity fluctuations of a relative strength of 10−5 via a self- and other data transformations, which are summarized gravitational instability, partly counteracted by the ex- by a nuisance field. A probabilistic language was used, pansion of the Universe. The initial density fluctuations but no direct reference to information theory was made, are believed to be produced during an early inflationary since not the optimal information reconstruction was the epoch of the Universe, and to carry valuable information aim, but a model for the human visual reception system. about the inflaton, the field which drove inflation, in their However, this work actually triggered our research. N-point correlation functions, to be extracted from the Bialek et al. [45] applied a field theoretical approach observational data. to recover a probability distribution from data. Here, a The onset of the structure formation process is well Bayesian prior was used to regularize the solution, which described by linear perturbation theory and therefore to was set up ad-hoc to enforce smoothness of the recon- conserve Gaussianity, however, the later evolution, the struction, obtained from the classical (or saddlepoint, or structures on smaller scales, and especially the galaxy maximum a posteriori) solution of the problem. How- formation require non-linear descriptions. The observa- ever, an “optimal” value for the smoothness controlling tional situation is complicated by the fact that the most parameter was derived from the data itself, a topic also important galaxy distance indicator, their redshift, is also addressed by Stoica et al. [46] and by a follow up publi- sensitive to the galaxy peculiar velocity, which causes the cation to ours [47]. Bialek et al. [45] also recognized, as observational data on the three-dimensional LSS to be we do, that an IFT can easily be non-local. partially degenerated. There are analytical methods to 4 describe these effects2, and also extensive work on N- Hydrogen and free streaming photons, has responded to body simulations of the structure formation, the latter the gravitational pull of the then already forming dark probably providing us with the most detailed and ac- matter structures. The photons from that epoch cooled curate statistical data on the properties of the matter due to the cosmic expansion since then into the CMB density field [e.g 75]. radiation we observe today, and carry information on In recent years, it was recognized that the evolution the physical properties of the photon-baryon fluid of that of the cosmic density field and its statistical proper- time like density, temperature and velocity. To very high ties can be addressed with field theoretical methods by accuracy, the spectrum of the photons from any direc- virtue of renormalization flow equations. Detailed semi- tion is that of a blackbody, with a mean temperature of analytical calculations for the density field time prop- 2.7 Kelvin and fluctuations of the order of 10−5 Kelvin, agator, the two- and three- point correlation functions imprinted by the primordial gravitational potentials at are now possible due to this, which are expected to decoupling. play an important role in future approaches to recon- Therefore, mapping these temperature fluctuations struct the initial fluctuations from the observational data permits precisely to study many cosmological parameters [76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90] . simultaneously, like the amount of dark matter produc- It was recognized early on, that the primordial den- ing the gravitational potentials, the ratio of photons to sity fluctuations can in principle be reconstructed from baryons, balancing the pressure and weight of the fluid, galaxy observations [41]. This has lead to a large devel- and geometrical and dynamical parameters of space-time opment of various numerical techniques for an optimal itself. The observations are technically challenging, and reconstruction [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, therefore require sophisticated algorithms to extract the 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, tiny signal of temperature fluctuations against the instru- 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, ment noise, but also to separate it from other astrophys- 124, 125, 126, 127, 128, 129, 130, 131]. Many of them ical foreground emission with the best possible accuracy. are based on a Bayesian approach, since they are im- A number of such algorithms were developed [e.g. 167, plementations and extension of the Wiener filter. How- 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, ever, also other principles are used, like, e.g. the least 180, 181, 182, 183, 184], which in many cases implement action approach, or Voronoi tessellation techniques [e.g. the Wiener filter. Thus, the required numerical tools for 132, 133, 134, 135, 136, 137, 138]. A discussion and clas- an IFT treatment of CMB data are essentially available. sification of the various methods can be found in [38]. The expected temperature fluctuations spectrum can Especially the Wiener filter methods were extensively be calculated from a linear perturbative treatment of applied to galaxy survey data3 and permitted partly the Boltzmann equations of all dynamical active parti- to extrapolate the matter distribution into the zone of cle species at this epoch, and fast computational imple- avoidance behind the galactic disk and to close the data- mentations exists permitting to predict it for a given set gap there, c.f. [157, 158, 159], a topic we also address in of cosmological parameters. Well known codes for this Sect. V. task are publicly available4 and permit to extract infor- Another cosmological relevant information field to be mation on cosmological parameters from the measured extracted from galaxy catalogues is the LSS power spec- CMB temperature fluctuation spectrum via comparison trum [e.g. 160, 161, 162, 163, 164]. This power is also to their predictions for a given parameter set. It was measurable in the CMB, and for a long time the CMB recognized early on that this should happen in an infor- provided the best spectrum normalization [165, 166]. mation theoretically optimal way, and Bayesian methods were therefore adapted in that area well before in other astrophysical disciplines [e.g. 188, 189, 190, 191]. 5. Cosmic Microwave Background The initial metric and density fluctuations, from which the CMB fluctuations and the LSS emerged, are believed Since our second example deals with the CMB, we give to be initially seeded by quantum fluctuations of a hy- a brief overview on it and on related inference methods. pothetical inflaton field, which should have driven an The CMB reveals the statistical properties of the mat- inflationary expansion phase in the very early Universe ter field at a time, when the Universe was about 1100 [192, 193, 194, 195, 196, 197]. The inflaton-induced fluc- times smaller in linear size than it is today. The photon- tuations have a very Gaussian probability distribution, baryon fluid, which decouples at that epoch into neutral however, some non-Gaussian features seem to be un- avoidable in most scenarios and can serve as a fingerprint to discriminate among them [e.g. 198, 199, 200, 201]. Ob-

2 Of special interest in this context may be [60], which already applies path-integrals, [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74], and the papers they refer to. 4 E.g. cmbfast (http://cmbfast.org, 3 Survey based reconstructions of the cosmic matter fields can be http://ascl.net/cmbfast.html, [185]), camb found in [139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, (http://camb.info/, [186]), and cmbeasy 150, 151, 152, 153, 154, 155, 156]. (http://www.cmbeasy.org/, [187]). 5 servational tests on such non-Gaussianities based on the is given in terms of a phase-space or path integral over three-point correlation function of the CMB data [e.g. all possible realizations of ψ, to be defined more precisely 202, 203, 204, 205, 206] were so far mostly negative, later (Sect. II E 1). however not sensitive enough to seriously constrain the A scientist is not actually interested in the total state possible theoretical parameter space of inflationary sce- of the Universe, but only in some specific aspects of it, narios, see e.g. [207, 208]. Recently, there has been the which we call the signal s = s[ψ]. The signal is a very claim of a detection of such non-Gaussianities by Yadav reduced description of the physical reality, and can be and Wandelt [209] and a confirmation of this with better any function of its state ψ, freely chosen according to data and improved algorithms is therefore highly desir- the needs and interests of the scientist or the ability and able. In Sect. VI we make a proposal for improving the capacity of the measurement and computational devices algorithmic side of this challenge. A recent review on the used. Since the signal does not contain the full phys- current status of CMB-Gaussianity can be found in [210]. ical state, any physical degree of freedom which is not present in the signal but influences the data will be re- ceived as probabilistic uncertainty, or shortly noise. The II. CONCEPTS OF INFORMATION probability distribution function of the signal, its prior THEORY P (s)= Dψδ(s − s[ψ]) P (ψ), (2) A. Information on physical fields Z is related to that of the data via the joint probability In our attempts to infer the properties of our Uni- P (d, s)= Dψδ(s − s[ψ]) P (d|ψ) P (ψ), (3) verse from astronomical observations we are faced with the problem of how to interpret incomplete, imperfect Z from which the conditional signal likelihood and noisy data, draw our conclusions based on them and quantify the uncertainties of our results. This is true for P (d|s)= P (d, s)/P (s) (4) using galaxy surveys to map the cosmic LSS, for the in- terpretation of the CMB, as well for many experiments and signal posterior in physical laboratories and compilations of geological, P (s|d)= P (d, s)/P (d) (5) economical, sociological, and biological data about our planet. Information theory, which is based on probability can be derived. theory and the Bayesian interpretation of missing knowl- Before the data is available, the phase-space of interest edge as probabilistic uncertainty, offers an ideal frame- is spanned by the direct product of all possible signals s work to handle such problems. It permits to describe and data d, and all regions with non-zero P (d, s) are of all relevant processes involved in the measurement prob- potential relevance. Once the actual data dobs have been abilistically, provided a model for the Universe or the taken, only a sub-manifold of this space, as fixed by the system under consideration is adopted. data, is of further relevance. The probability function The states of such a model, denoted by the state vari- over this sub-space is proportional to P (d = dobs,s), and able ψ, are identified with the possible physical reali- needs just to be renormalized by dividing by ties. They can have probabilities P (ψ) assigned to them, the so-called prior information. This prior contains our Ds P (dobs,s)= Ds Dψδ(s − s[ψ]) P (dobs|ψ) P (ψ) knowledge about the Universe as we model it before any Z Z Z other data is taken. For a given cosmological model, the = Dψ P (dobs|ψ) P (ψ)= P (dobs), (6) prior may be the probability distribution of the different Z initial conditions of the Universe, which determine the which is the unconditioned probability (or evidence) of subsequent evolution completely. Since our Universe is that data. Thus, we find the resulting information of spatially extended, the state variable will in general con- the data to be the posterior distribution P (s|dobs) = tain one or several fields, which are functions over some P (dobs,s)/P (dobs). This posterior is the fundamental coordinates x. mathematical object from which all our deductions have Also the measurement process is described by a data to be made. It is related via Bayes’s theorem [1] to the model which defines the so-called likelihood, the prob- usually better accessible signal likelihood, ability P (d|ψ) to obtain a specific dataset d given the physical condition ψ. In case the outcome d of the mea- P (s|d)= P (d|s) P (s)/P (d), (7) surement is deterministic P (d|ψ) = δ(d − d[ψ]), where which follows from Eqs. 4 and 5. d[ψ] is the functional dependence of the data on the state. The normalization term in Bayes’s theorem, the evi- In any case, the probability distribution function of the dence P (d), is now also fully expressed in terms of the data, joint probability of data and signal,

P (d)= Dψ P (d|ψ) P (ψ), (1) P (d)= Ds P (d, s), (8) Z Z 6 and the underlying physical field ψ basically becomes in- special cases. Thus, the concepts of signal response and visible at this stage in the formalism. The evidence plays therewith defined noise depend on the adopted coordi- a central role in Bayes inference, since it is the likeli- nate system in the data space. This coordinate system hood of all the assumed model parameters. Combining can be changed via a data transformation T , and the this parameter-likelihood with parameter-priors one can transformed data may exhibit better or worse response start Bayesian inference on the model classes. to the signal. Information theory aids in designing a suit- able data transformation, so that the signal response is maximal, and the signal noise is minimal, permitting the B. Signal response and noise signal to be best recovered. Thus, we may aim for an optimal T , which yields If signal and data depend on the same underlying phys- ical properties, there may be correlations between the T [d]= hsi(s|d). (14) two, which can be expressed in terms of signal response We define the posterior average of the signal, m = R and noise n of the data as d hsi(s|d), to be the map of the signal given the data d and call T a map-making-algorithm if it fulfills Eq. 14 at least d = R[s]+ ns. (9) approximately. As a criterion for this one may require We have chosen two different ways of denoting the de- that the signal response of a map-making-algorithm, pendence of response and noise on the signal s, in order to highlight that the response should embrace most of RT [s] ≡ hT [d]i(d|s), (15) the reaction of the data to the signal, whereas the noise is positive definite with respect to signal variations as should be as independent as possible. We ensure this by stated by putting the linear correlation of the data with the signal fully into the response. The response is therefore the part δR [s] T ≥ 0. (16) of the data which correlates with the signal δs

R[s] ≡ hdi ≡ Dd d P (d|s), (10) This ensures that a map-making algorithm will respond (d|s) with a non-negative correlation of the map to any signal Z feature, with respect to the noise ensemble. In general, and the noise is just defined as the remaining part which T will be a non-linear operation on the data, to be con- does not: structed from information theory if it should be optimal in the sense of Eq. 14. In any case, the fidelity of a sig- n ≡ d − R[s]= d − hdi . (11) s (d|s) nal reconstruction can be characterized by the quadratic Although the noise might depend on the signal, as it is signal uncertainty, well known for example for Poissonian processes, it is – σ2 = h(s − T [d]) (s − T [d])†i , (17) per definition – linearly uncorrelated to it, T,d (s|d) averaged over typical realizations of signal and noise. Of hn s†i = (hdi − R[s]) s† =0 s† =0, (12) s (d|s) (d|s) special interest is the trace of this whereas higher order correlation might well exist and 2 2 may be further exploited for their information content. Tr(σT,d)= dx h|sx − Tx[d]| i(s|d), (18) The dagger denotes complex conjugation and transposing Z of a vector or matrix. since it is the expectation value of the squared Lebesgue- These definitions were chosen to be close to the usual L2-space distance between a signal reconstruction and language in signal processing and data analysis. They the underlying signal. Requesting a map making algo- permit to define signal response and noise for an arbitrary rithm to be optimal with respect to Eq. 18, implies choice of the signal s[ψ]. No direct causal connection T [d] = hsi(s|d) and therefore it to be optimal in an in- between signal and data is needed in order to have a formation theoretical sense according to Eq. 14. non-trivial response, since both variables just need to 2 The uncertainty σT,d depends on d, since in Bayesian exhibit some couplings to a common sub-aspect of ψ. inference one averages over the posterior, which is condi- The above definition of response and noise is however not tional to the data. The frequentist uncertainty estimate, unique, even for a fixed signal definition, since any data which is the expected uncertainty of any estimator before transformation d′ = T [d] can lead to different definitions, the data is obtained, is given by an average over the joint as seen from probability function:

′ ′ 2 † R [s] ≡ hd i(d|s) = hT [d]i(d|s) 6= T [hdi(d|s)]= T [R[s]]. σT = h(s − T [d]) (s − T [d]) i(d,s). (19) (13) Exceptions are some unique relations between signal and The latter is a good quantity to characterize the overall 2 state, P (ψ|s)= δ(ψ − ψ[s]), and maybe a few other very performance of an estimator, whereas Tr(σT,d) is a more 7 precise indicator of the actual estimator performance for of the individual galaxies also depend in a non-trivial a given dataset. As we will see in our IFT applications, way on the small-scale modes. Due to the discreteness of data dependence of the uncertainty is a common feature our observable, the galaxy positions, it may be impossi- of non-linear inference problems. ble to reconstruct these small scale modes. Therefore it An illustrative example should be in order. Suppose could be sensible to define a signal s[ψ] = F ψ, with F our data is an exact copy of a physical field, d = ψ, our being a linear low-pass filter, which suppresses all small- signal the square of the latter, s = ψ2, and the physical scale structures. This signal may be reconstructible with field obeys an even statistics, P (ψ) = P (−ψ). Then, high precision, whereas any attempt to reconstruct ψ di- the signal response is exactly zero, R[s] = 0, and the rectly would be plagued by a larger error budget, since data contains only noise with respect to the chosen signal, all the data-unconstrained small-scale modes represent d = ns. Thus, we have chosen a bad representation of uncertainties to a reconstruction of ψ, but not to one of our data to reveal the signal. If we, however, introduce s being defined as a low pass filtered version of ψ. the transformation d′ = T [d] = d2, we find a perfect ′ ′ response, R [s]= s, and zero noise, ns = 0. In this case, finding the optimal map-making algorithm D. Signal moment calculation was trivial, but in more complicated situations, it can not be guessed that easily. Since the response and noise definitions depend on the signal definition, some thoughts The information of some data d on a signal s defined over some set Ω, which in most applications will be a should be given to how to choose the signal in a way that n it can be well reconstructed. manifold like a sub-volume of the R , or the sphere in case of a CMB signal, is completely contained in the posterior P (s|d) of the signal given the data.5 The ex- C. Signal design pectation value of s at some location x ∈ Ω, and higher correlation functions of s can all be obtained from the posterior by taking the appropriate average: For practical reasons one will usually choose s accord- ing to a few guidelines, which should simplify the infor- mation induction process: hs(x1) ··· s(xn)id ≡ hs(x1) ··· s(xn)i(s|d)

1. The functional form of s[ψ] should best be simple, ≡ Dss(x1) ··· s(xn) P (s|d).(20) steady, analytic, and if possible linear in ψ, permit- Z ting to use the signal s to reason about the state of The problem is that often neither the expectation val- reality ψ. ues nor even the posterior are easily calculated analyt- 2. The degrees of freedom of s should be related to ically, even for fairly simple data models. Fortunately, the ones of the data d in the sense that cross cor- there is at least one class of data models for which the relations exist which permit to deduce properties posterior and all its moments can be calculated exactly, of s from d. Signal degrees of freedoms, which are namely in case the posterior turns out to be a multivari- insensitive to the data, will only be constrained by ate Gaussian in s. In this case analytical formulae for the prior and therefore just contain a large amount all moments of the signal are known and are in principle of uncertainty. This adds to the error budget, and computable. Technically, one is still often facing a huge, should be avoided as far as possible. but linear inverse problem. However, in the last decades a couple of computational high-performance map-making 3. The choice of s[ψ] should also be lead by math- techniques were developed to tackle such problems either ematical convenience and practicality. In the ex- on the sphere, for CMB research, or in flat spaces with amples presented in this work, simple signals are one, two or three dimensions, for example for the recon- chosen which permit to guess good approximations struction of the cosmic LSS (detailed references are given for signal likelihood P (d|s) and prior P (s) without in Sect. I C). The purpose of this work is to show how the need to develop the full physical theory starting to expand other posterior distributions around the Gaus- with P (ψ). sian ones in a perturbative manner, which then permits to use the existing map-making codes for the computa- To give a more specific example, we assume a cosmo- tion of the resulting diagrammatic perturbation series. logical model in which the reality is thought to be solely Since the diagrammatic perturbation series in Feynman- characterized by the primordial dark matter density dis- diagrams are well known and understood in QFT and tribution ψ(x), from which all observable cosmological phenomena like galaxies derive in a deterministic way. The coordinate x may refer to the comoving coordinates at some early epoch of the Universe. Although the LSS 5 We are mostly dealing with scalar fields, however, multi- of the matter distribution at a later time may predom- component, vector or tensor fields can be treated analogously, inantly depend on the initial large-scale modes, and is and many of the equations just have to be re-interpreted for reflected in the galaxy distribution, the actual positions such fields and stay valid. 8

SFT, the most economical way is to reformulate the in- The path integral of a functional F [f] ≡ formation theoretical problem in a language which is as F (f1,...,fNpix ) over all realizations of such a dis- close as possible to the former two theories. Thereby, cretized field f is then just a high-dimensional volume many of the results and concepts become directly avail- integral, with as many dimensions as pixels: able for signal inference problems. Moreover, it seems that expressing the optimal signal estimator in terms of Npix Feynman diagrams immediately provides computation- Df F [f] ≡ dfi F (f1,...,fN ). (23)   pix ally efficient algorithms, since the diagrams encode the i=1 Z Y Z skeleton of the minimal necessary computational infor-   mation flow. This definition of a finite-dimensional path integral is well normalized, since in case that we want to integrate over a probability distribution over f, which is separable for E. Signal and data spaces Npix all pixels, P (f) = i=1 Pi(fi), as e.g. for white and Poissonian noise, we find 1. Discretisation and continuous limit Q Npix h1i = Df P (f)= df P (f) =1. (24) Both, the signal and the data space may be continuous, (f) i i=1 however, in practice will most often be discrete since dig- Z Y Z =1 ital data processing only permits to chose a discretized representation of the distributed information. The space Although, in real data-analysis| applications,{z } it is prac- in which the data and signal discretisation happens can tically never required to perform the continuous limit be chosen freely, and of course can be as well a Fourier, Npix → ∞ with Vi → 0 for all i, we stress that this limit wavelet or spherical harmonics space. Even if we would can formally be taken and is well defined even for the like to analyze a continuous signal, the computationally path integral, as we argue in more detail in Sec. IV B. required discretisation will force an implicit redefinition The basic argument is that suitable signals could and of our actual signal to be the discretely sampled version of should be defined in such a way that path-integral di- that continuous signal, and this discretisation step should vergences, which plague sometimes QFT, can easily be also be part of the data model, if it has the potential to avoided by sensible signal design. Practically, the ex- significantly affect the analysis [e.g. see 211]. istence of a well-defined continuous limit of a well-posed Although discretisation implies some information loss IFT implies that two numerical implementations of a sig- it also has an advantage. We can just assume discretisa- nal reconstruction problem, which differ in their space tion and therefore read all scalar and tensor products as discretisation on scales smaller than the structures of the being the usual, component-wise ones, now just in high-, signal, can be expected to provide identical results up but finite-dimensional vector spaces. to a small discretisation difference, which vanishes with To be concrete, let {xi} ⊂ Ω be a discrete set of Npix higher discretisation-resolution. pixel positions, each of which has a volume-size Vi at- tributed to it, then the scalar product of two discretized function-vectors f = (fi), and g = (gi) sampled at these 2. Parameter spaces points via fi = f(xi), and gi = g(xi) could be defined by In many applications, the signal space is identified with Npix † ∗ the physical space or with the sphere of the sky. How- g f ≡ Vi gi fi. (21) ever, IFT can also be done over parameter spaces. In i=1 X Sec. VI, a field theory over the sphere will implicitly de- The asterix denotes complex conjugation. This scalar fine the knowledge state for an unknown parameter of product has the continuous limit that theory, which can be regarded again to define an information theory for that parameter. The latter is an † ∗ IFT in case that the parameter has spatial variations. g f −→ dx g (x) f(x). (22) However, there are also functions defined over a param- Z eter space, Ωparameter = {p} for some parameter p, which In many cases the actual volume normalization in Eq. one might want to obtain knowledge on from incomplete 21 does not matter for final results, since it usually can- data. A very import one is the probability distribution cels out, and therefore Vi is often dropped completely for of the parameter given the observational data, P (p|d), equidistant sampling of signal and data spaces. The vol- which defines our parameter-knowledge state. This func- ume terms also disappear for a scalar product involving tion may only be incompletely known and therefore re- a function which is discretized via volume integration, quire an IFT approach for its reconstruction and inter- f = dx f(x), e.g. the number of counts within the i Vi polation. Such incomplete knowledge on the function cell i. Anyhow, higher order tensor products are defined could be due to incomplete numerical sampling of its analogously.R function values because of large computational costs and 9 the huge volumes of multi-dimensional parameter spaces. Of special importance are the so-called connected corre- Or, there might be another unknown nuisance parame- lation functions or cumulants ter q in the problem, which induces an uncertainty in n δ log Zd[J] P(p|d) = P (p|d) and therefore an IFT over all possible hs(x ) ··· s(x )ic ≡ , (32) 1 n d δJ(x ) ··· δJ(x ) realizations of this knowledge state field function via 1 n J=0

which are corrected for the contribution of lower mo- P [P(p|d)]= DP(p|d) δ P(p|d) − dq P (p, q|d) . (25) ments to a correlator of order n. For example, the con- Z  Z  nected mean and dispersion are expressed in terms of In case that q is a field, the marginalisation integral in the delta functional also becomes a path-integral. Prob- their unconnected counterparts as: abilistic decision theory, based on knowledge state as ex- hs(x)ic = hs(x)i , pressed by probability functions on parameters, has to d d c deal with such complications. For inference directly on hs(x) s(y)id = hs(x) s(y)id − hs(x)id hs(y)id, (33) p, and not on the knowledge state P(p|d), the marginal- ized probability where the last term represents such a correction. For Gaussian random fields all higher order connected corre- P (p|d)= dq P (p, q|d) (26) lators vanish: Z c contains all relevant information, and that will be suffi- hs(x1) ··· s(xn)id = 0 (34) cient for most inference applications, and especially for for n > 2. For non-Gaussian random fields, they are the ones in this work. in general non-zero, and for later usage we provide the connected three- and four-point functions,

III. BASIC FORMALISM c hsxsyszid = h(sx − s¯x)(sy − s¯y)(sz − s¯z)id, c A. Information Hamiltonian hsxsyszsuid = h(sx − s¯x)(sy − s¯y)(sz − s¯z)(su − s¯u)id c c c c − hsxsyidhszsuid − hsx szidhsy suid c c We argued that the posterior P (s|d) contains all avail- − hsx suidhsy szid, (35) able information on the signal. Although the posterior might not be easily accessible mathematically, we assume where we used sx = s(x) and defineds ¯x = hs(x)id. in the following that the prior P (s) of the signal before The assumption that the Hamiltonian can be Taylor- the data is taken as well as the likelihood of the data Fr´echet expanded in the signal field permits to write given a signal P (d|s) are known or at least can be Taylor- ∞ Fr´echet-expanded around some reference field configura- 1 1 H[s]= s†D−1 s − j†s + H + Λ(n) s ··· s . tion t. Then Bayes’s theorem permits to express the pos- 2 0 n! x1...xn x1 xn n=3 terior as X (36) P (d, s) P (d|s) P (s) 1 Repeated coordinates are thought to be integrated over. P (s|d)= = ≡ e−H[s] . (27) P (d) P (d) Z The first three Taylor coefficients have special roles. The constant H0 is fixed by the normalization condition of the Here, the Hamiltonian ′ joint probability density of signal and data. If Hd[s] de- H[s] ≡ Hd[s] ≡− log [P (d, s)] = − log [P (d|s) P (s)] , notes some unnormalised Hamiltonian, its normalization (28) constant is given by the evidence of the data −H′ [s] H0 = log Ds Dde d . (37) P (d) ≡ Ds P (d|s) P (s)= Dse−H[s] ≡ Z, (29) Z Z Z Z Often H0 is irrelevant unless different models or hyper- and the partition function Z ≡ Zd were introduced. It is extremely convenient to include a moment generating parameters are to be compared. function into the definition of the partition function We call the linear coefficient j information source. This term is usually directly and linearly related to the data. −H[s]+J†s The quadratic coefficient, D−1, defines the information Zd[J] ≡ Dse . (30) propagator D(x, y), which propagates information on the Z This means P (d) = Z = Z[0], but also permits to signal at y to location x, and thereby permits, e.g., to par- tially reconstruct the signal at locations where no data calculate any moment of the signal field via Fr´echet- (n) differentiation of Eq. 30, was taken. Finally, the anharmonic tensors Λ create interactions between the modes of the free, harmonic the- n 1 δ Zd[J] ory. Since this free theory will be the basis for the full hs(x ) ··· s(x )i = . (31) 1 n d Z δJ(x ) ··· δJ(x ) interaction theory, we first investigate the case Λ(n) = 0. 1 n J=0

10

B. Free theory in case of discrete data but continuous signal spaces. Fi- nally, 1. Gaussian data model G 1 † −1 1 H0 = d N d + log (|2 πS| |2 πN|) (47) For our simplest data model we assume a Gaussian 2 2 signal with prior has absorbed all s-independent normalization constants. The partition function of the free field theory, 1 1 † −1 P (s)= G(s,S) ≡ 1 exp − s S s , (38) 2 2 |2πS| †   −HG [s]+J s ZG[J] = Dse (48) † where S = hss i is the signal covariance. The signal is Z assumed here to be processed by nature and our mea- 1 = Ds exp − s†D−1s + (J + j)†s − HG , surement device according to a linear data model 2 0 Z   d = Rs + n. (39) is a Gaussian path integral, which can be calculated ex- actly, yielding Here, the response R[s] = Rs is linear in and the noise ns = n is independent of the signal s. The linear response 1 matrix R of our instrument can contain window and se- Z [J]= |2πD| exp + (J + j)†D(J + j) − HG . G 2 0 lection functions, blurring effects, and even a Fourier-   transformation of the signal space, if our instrument is p (49) an interferometer. Typically, the data-space is discrete, The explicit partition function permits to calculate via whereas the signal space may be continuous. In that case Eq. 32 the expectation of the signal given the data, in the i-th data point is given by the following called the map md generated by the data d:

di = dx Ri(x) s(x)+ ni. (40) δ log Z m = hsi = G = D j (50) Z d d δJ J=0 We assume, for the moment, but not in general, the −1 = S−1 + R†N −1R R†N −1 d. noise to be signal-independent and Gaussian, and there- fore distributed as  FWF  P (n|s)= G(n,N), (41) The last expression| shows that{z the map} is given by where N = hnn†i is the noise covariance matrix. Since the data after applying a generalized Wiener filter, the noise is just the difference of the data to the signal- md = FWF d. The propagator D(x, y) describes how response, n = d − Rs, the likelihood of the data is given the information on the density field contained in the by data at location x propagates to position y: m(y) = dx D(y, x) j(x). P (d|s)= P (n = d − Rs|s)= G(d − Rs,N), (42) The connected autocorrelation of the signal given the Rdata, and thus the Hamiltonian of the Gaussian theory is −1 hss†ic = D = S−1 + R†N −1R , (51) HG [s] = − log [P (d|s) P (s)] d = − log [G(d − Rs,N) G(s,S)] is the propagator itself. All higher connected correlation 1 † −1 † G functions are zero. Therefore, the signal given the data is = s D s − j s + H0 . (43) 2 a Gaussian random field around the mean md and with Here a variance of the residual error

−1 † −1 −1 D = S + R N R (44) r = s − md (52) is the propagator of the free theory. The information provided by the propagator itself, as a straightforward source, calculation shows: † −1 j = R N d, (45) † † † † c hrr id = hss id − hsidhs id = hss id = D. (53) depends linearly on the data in a response-over-noise weighted fashion and reads Thus, the posterior should be simply a Gaussian given by ∗ −1 j(x)= Ri (x)Nij dj (46) ij P (s|d)= G(s − md,D). (54) X 11

As a test for the latter equation, we calculate the evidence IV. INTERACTING INFORMATION FIELDS of the free theory via A. Interaction Hamiltonian P (d|s) P (s) G(d − Rs,N) G(s,S) P (d) = = P (s|d) G(s − Dj,D) 1. General Form 1 |D|/|S| 2 1 = exp (j†D j − d†N −1d) (55), |2πN| 2     All results of the free theory presented so far are well- known within the field of signal reconstruction. IFT re- which is indeed independent of s and also identical to produces them elegantly, and is therefore of pedagogical ZG[0], as it should be. value. However, the new results presented in the rest of this paper arise as soon as one leaves the free theory. Non-Gaussian signal or noise, a non-linear response, or a signal dependent noise create anharmonic terms in the 2. Free classical theory Hamiltonian. These describe interactions between the eigenmodes of the free Hamiltonian. The Hamiltonian permits to ask for classical equations We assume the Hamiltonian can be Taylor expanded derived from an extremal principle. This is justified, on in the signal fields, which permits to write the one hand, as being just the result of a the saddle- point approximation of the exponential in the partition ∞ 1 † −1 † G 1 (n) function. On the other hand, the extrema principle is H[s]= s D s − j s + H + Λ sx ··· sx . 2 0 n! x1...xn 1 n equivalent to the maximum a posteriori (MAP) estima- n=0 X tor, which is quite commonly used for the construction HG [s] of signal-filters. An exhaustive introduction into and dis- Hint[s] | {z } (59) cussion of the MAP approximation to Gaussian and non- | {z } Repeated coordinates are thought to be integrated over. Gaussian signal fields is provided by Lemm [48]. In contrast to Eq. 36 we have now included perturba- The classical theory is expected to capture essential tions which are constant, linear and quadratic in the sig- features of the field theory. However, if the field fluctua- nal field, because we are summing from n = 0. This tions are able to probe phase space regions away from the permits to treat certain non-ideal effects perturbatively. maximum in which the Hamiltonian (or posterior) has a For example if a mostly position-independent propagator more complex structure, deviations between classical and gets a small position dependent contamination, it might field theory should become apparent. be more convenient to treat the latter perturbatively and Extremizing the Hamiltonian of the free theory (Eq. not to include it into the propagator used in the calcula- 43) tion. Note further, that all coefficients can be assumed to be symmetric with respect to their coordinate-indices.6

δHG Often, it is more convenient to work with a shifted = D−1m − j ≡ 0 (56) δs field φ = s − t, where some (e.g. background) field t is s=m removed from s. The Hamiltonian of φ reads we get the classical mapping equation m = Dj, which is identical to the field theoretical result (Eq. 50). 1 H′[φ] = φ†D−1 φ − j′†φ + H′ It is also possible to measure the sharpness of the max- 2 0 imum of the posterior by calculating the Hessian curva- ′ HG[φ] ture matrix | {z } δ2 H[s] H [m]= = D−1. (57) G 2 6 (n) (n) δs This means Dxy = Dyx and Λ = Λ with π s=m xπ(1) ...xπ(n) x1...xn any permutation of {1,...,n}, since even non-symmetric coef-

ficients would automatically be symmetrized by the integration In the Gaussian approximation of the maximum of the over all repeated coordinates. Therefore, we assume in the fol- posterior, the inverse of the Hessian is identical to the lowing that such a symmetrization operation has been already covariance of the residual done, or we impose it by hand before we continue with any per- turbative calculation by applying † −1 1 hr r i = H [m]= D, (58) Λ(n) 7−→ Λ(n) . x1...xn n! xπ(1)...xπ(n) π∈P Xn which for the pure Gaussian model is of course identical This clearly leaves any symmetric tensor invariant if Pn is the to the exact result, as given by the field theory (Eq. 53). space of all permutations of {1,...,n}. 12

∞ (n) 1 ′(n) 4. Vertices with n legs represent the term −Λx′ ...x′ , + Λ φx ··· φx , with 1 n n! x1...xn 1 n n=0 where each individual leg is labeled by one of the ′ ′ X internal coordinates x1,..., xn. This more com- H′ [φ] int plex vertex-structure, as compared to QFT, is a 1 consequence of non-locality in IFT. H′ = HG| − j†t + t{z†D−1t, } (60) 0 0 2 j′ = j − D−1 t, and 5. All internal (and therefore repeatedly occurring) coordinates are integrated over, whereas external ∞ 1 Λ′(m) = Λ(m+n) t ··· t . coordinates are not. x1...xm n! x1...xm+n x1 xn n=0 X 6. Every diagram is divided by its symmetry factor, the number of permutations of vertex legs leaving 2. Feynman rules the topology invariant, as described in any book on field theory [e.g. 57]. Since all the information on any correlation functions The n-th moment of s is generated by taking the n-th of the fields is contained in the partition sum and can be derivative of log Z[J] with respect to J, and then set- extracted from it, only the latter needs to be calculated: ting J = 0. This correspond to removing n end-vertices † from all diagrams. For example, the first four diagrams Z[J] = Dse−H[s]+J s contributing to a map (m = hsi(s|d)) are Z ∞ 1 † (n) −HG[s]+J s = D j = Dxy jy = Ds exp − Λx1...xn sx1 ··· sxn e " n! # Z n=0 ≡ dy D(x, y) j(y), ∞ X 1 (n) δ δ Z = exp − Λx ...xn ··· 1 1 n! 1 δJ δJ (3) (3) " n=0 x1 xn # = − D Λ [·,D]= − Dxy Λyzu Dzu X 2 2 † × Dse−HG[s]+J s 1 (3) ≡ − dy Dxy dz du Λxyu Dzu, Z 2 δ Z Z Z = exp −H [ ] Z [J]. (61) 1 (3) int δJ G = − D Λ [·, D j, D j]   2 There exist well known diagrammatic expansion tech- 1 (3) = − Dxy Λyuz Dzz′ jz′ Duu′ ju′ (62) niques for such expressions [e.g. 57]. The expansion terms 2 of the logarithm of the partition sum, from which any 1 (3) ≡ − dy Dxy dz du Λ connected moments can be calculated, are represented by 2 yzu Z Z Z all possible connected diagrams build out of lines ( ), ′ ′ vertices (with a number of legs connecting to lines, like × dz Dzz′ jz′ du Duu′ ju′ , and Z Z , , , , ...) and without any external line- 1 = − D Λ(4)[·,D, Dj] ends (any line ends in a vertex). These diagrams are 2 interpreted according to the following Feynman rules: 1 (4) = − Dxy Λyzuv Dzu Dvv′ jv′ 1. Open ends of lines in diagrams correspond to ex- 2 1 ternal coordinates and are labeled by such. Since ≡ − dy D dz du dv Λ(4) D the partition sum in particular does not depend 2 xy yzuv zu Z Z Z Z on any external coordinate, it is calculated only ′ from summing up closed diagrams. However, × dv Dvv′ jv′ . the field expectation value m(x) = hs(x)i(s|d) = Z d log Z[J]/dJ(x)|J=0 and higher order correlation Here we have assumed that any first and second order functions depend on coordinates and therefore are perturbation was absorbed into the data source and the calculated from diagrams with one or more open propagator, thus Λ(1) = Λ(2) = 0. Repeated indices are ends, respectively. assumed to be integrated (or summed) over. 2. A line with coordinates x′ and y′ at its end repre- sents the propagator D ′ ′ connecting these loca- x y 3. Local interactions and Fourier space rules tions.

3. Vertices with one leg get an individual internal, In case of purely local interactions integrated coordinate x′ and represent the term (1) (n) ′ ′ ′ jx + Jx − Λx . Λx1...xn = λn(x1) δ(x1 − x2) ··· δ(x1 − xn) (63) 13 the interaction Hamiltonian reads 4. Simplistic interaction Hamiltonians

∞ 1 H = λ† sm (64) In order to have a toy case, which permits analytic int m! m m=0 calculations, we introduce a simplistic Hamiltonian by X requiring the data model to be translational invariant and the expressions of the Feynman diagrams simplify and all interaction terms to be local. This is the case considerably. The fourth Feynman rule can be replaced whenever the signal and noise covariances are fully char- by acterized by power spectra over the same spatial space, n 4. Vertices with n lines connected to it are associated S(k, q) = (2π) δ(k − q) PS (k), (66) with a single internal coordinate x′ and represent n ′ N(k, q) = (2π) δ(k − q) PN (k), (67) the term −λn(x ). 2 2 with Ps(k) = h|s(k)| i/V , and Pn(k) = h|n(k)| i/V , For example, the last loop diagram in Eq 62 becomes where V is the volume of the system. We assume further that the signal processing can be completely described 1 = − dy D λ (y) D dz D j . (65) by a convolution with an instrumental beam, 2 xy 4 yy yz z Z Z d(x)= dy R(x − y) s(y)+ n(x), (68) In case of local interactions, it can be helpful to do the calculations in Fourier space, for which the Feynman Z where the response-convolution kernel has a Fourier rules can be obtained by inserting a real-space identity power spectrum P (k) = |R(k)|2 (no factor 1/V ). In operator 1 = F †F in between any scalar product and R this case D can be fully described by a power spectrum: assigning the inverse Fourier transformation F † to the n left and the forward transform F to the right term, e.g. D(k, q) = (2π) δ(k − q) PD(k), (69) † † † ′ ′ −1 −1 −1 D j = F F D F F j = F D j . with PD(k) = (PS (k)+ PR(k) PN (k)) . ′ The locality of the interaction terms requires λm = D j′ const beside translational invariance and therefore the This yields: | {z } |{z} interaction Hamiltonian reads ∞ λm m 1. An open end of a line has an external momentum Hint[s] = dx s (x) (70) − ikx n m! coordinate k, and gets an dk e /(2π) applied m=1 Z to it, if real space functions are to be evaluated. X∞ m m R λm dki = s (2π)nδ( k ) 2. A line connecting momentum k with momentum k′ m! (2π)n ki j m=1 i=1 ! j=1 corresponds to a directed propagator between these X Y Z X ′ momenta: Dkk′ = D(k, k ). In that case, the Feynman rules simplify considerably. For the interaction Hamiltonian of Eq. 70, the Feynman ′′ ′′ 3. A data source vertex is (j + J − λ1)(k ), where k rules are now: is the momentum at the data-end of the line. 1. unintegrated x-coordinate: exp(−i k x) (if real 4. A vertex with m > 1 lines with momentum labels space functions are to be evaluated), n m k1,...,km is −λm(k0)(2π) δ( i=0 ki). 2. propagator: PD(k), 5. An internal end of a line hasP an internal (in- 3. data source vertex: (j + J − λ )(k), tegrated) momentum coordinate k′. Integration 1 ′ n means a term dk /(2π) in front of the expres- 4. vertex with m> 1 lines: −λm, sion. R 5. imply momentum conservation at each vertex: n m 6. The expression gets divided by the symmetry factor (2π) δ( i=1 ki)), and integrate over every internal dk of its diagram. momentum: n , P (2π) Here, j(k) = (F j)(k) = dx j(x) eikx, D(k, k′) = 6. and divide byR the symmetry factor. ′ ′ (F D F †)(k, k′)= dx dx′D(x, x′) ei (kx−k x ), etc. are R the Fourier-transformed information source, propagator, etc., respectively R R 5. Feynman rules on the sphere Note, that momentum directions have to be taken into account. The momenta that go into a vertex, data source For CMB reconstruction and analysis, but presumably or open end get a positive sign in the delta-function of also for terrestrial applications, the Feynman rules on momentum conservation, the ones that go out of a vertex the sphere Ω = S2 are needed and therefore provided in get a minus sign. Appendix B. 14

B. Normalisability of the theory should provide confidence in the well-behavedness of the formalism in sensible applications. The price to be payed In contrast to QFT, IFT should be properly normal- for this well-behavedness is the more complex structure ized and not necessarily require any renormalization pro- of the propagator, which, in comparison to QFT, even cedure. The reason is that IFT is not a low-energy limit in simplistic cases can be non-analytical and require nu- of some unknown high-energy theory, but can be set up merical evaluation. as the full (high-energy) theory. The Hamiltonian is just the logarithm of the joint probability function of data and C. Expansion around the classical solution signal, Hd[s] ≡− log [P (d, s)], and therefore well defined and properly normalized if the latter is. Only if ad-hoc Hamiltonians are set up, or if approximations lead to ill- 1. General case normalized theories, normalization should be an issue. However, since we are trying a perturbative expansion The classical solution of the Hamiltonian in Eq. 59 is of the theory, there is no guarantee that all individual provided by its minimum, terms are providing finite results. For example in QFT, ∞ simple loop diagrams are known to be divergent and re- δH 1 = D−1s − j + Λ(m+1) s ...s =0. quire renormalization. In the following we investigate a δs xy y x m! xx1...xm x1 xm x m=1 simplistic, but representative case of IFT, which shows X (72) that such problems are generally not to be expected. This leads to the equation for the classical field Let us adopt the simplistic situation described in IV A 4 and estimate a simple loop diagram for which ∞ n ′ cl 1 (m+1) cl cl we assume for notational convenience λ3 = −2 (2π) λ sy = Dyx jx − Λxx1...xm sx1 ...sxm , (73) ′ m! (with λ > 0): m=1 ! X 1 which one can try to solve iteratively. = − D λ D (71) 2 3 ′ ′ ′ ′ ′ ikx = λ dk bdk δ(k + k − k ) PD(k) PD(k ) e 2. Local interactions Z Z ′ ′ ′ ′ 2 ≤ λ PD(0) dk PS(k )= λ V PD(0) hs (x)i, For simplicity, we concentrate for a moment on the Z case of purely local interactions, for which the equation where V is the volume of the system. Here and in the for the classical field scl is following, C denotes the diagonal of the matrix C. ∞ λ† Thus, as long the signal field is of bounded variance, s = D j − m+1 sm . (74) cl m! cl the loop diagramb is convergent due to PD ≤ PS for all k. m=1 ! Even a signal of unbounded variance would not lead to a X divergent loop diagram if dk (PN /PR)(k) is finite, since Iterating this equation and rewriting the resulting terms we also have PD ≤ PN /PR. A bounded variance signal as Feynman diagrams shows that the classical solution is very natural, especiallyR in a cosmological setting.7 contains the tree-diagrams. The loop diagrams can be Finally, since a signal as an information field can be added by investigation of the non-classical uncertainty chosen freely, we can define it to be the filtered version of field φ = s − scl. the physical field (e.g. dark matter distribution or CMB A non-classical expansion of the information field fluctuations), so that only modes of sufficiently bound around the classical field is possible by inserting s = variance are present in it. Since we have the freedom to scl + φ into the Hamiltonian (Eq. 64). Reordering terms chose information fields, which are mathematically well according to the powers of the field φ leads to its Hamil- behaved, we can therefore ensure convergence of expres- tonian sions. ′ Although this is not a general proof of normalisability H [φ] ≡ H[scl + φ] ∞ of the theory, which is beyond the scope of this paper, it 1 1 = H′ + φ†D′−1 φ − j′†φ + λ′ †φm, 0 2 m! m m=3 X with ∞ 7 The cosmological signal of primary interest, the initial den- λ λ′ ≡ n+m sm, (75) sity fluctuations as revealed by the large-scale-structure and the n m! cl m=0 CMB, is expected to exhibit a suppression of small-scale power X due to the free-streaming of dark matter particles before they be- 1 H′ ≡ H[s ]= H + s †D−1s + λ′ , came non-relativistic. Also the CMB temperature fluctuations 0 cl 0 2 cl cl 0 are damped on small scales, due to free streaming of photons ′ ′ −1 ′ −1 ′ −1 around the time of recombination. j ≡ j − λ1 − D scl, and D ≡ (D + λ2) .

c 15

In case scl is exactly the classical solution, Eqs. 74 and of which the latter is the as a func- 75 imply that j′ = 0. Thus, there are no one-line inter- tion of the inverse temperature β, we can write nal vertices in any Feynman-graphs of the φ-theory, and only loop-diagrams contribute uncertainty-corrections8 ∂Fβ[d, J] Id = − log Z1[d, 0] − hH[s]id = − , to any information theoretical estimator. For example, ∂β β=1,J=0 the uncertainty-corrections to the classical map estima- (79) tor are given by as can be verified by a direct calculation. The first ex- pression for Id in Eq. 79 is equivalent to the well known δm = md − scl = hφid (76) thermodynamic relation F = E − TSB with the internal energy E = hH[s]id, the Boltzmann entropy SB = −Id and the temperature, which is set here to T = 1. The sec- = + + + + + ... ond expression actually holds even if the Hamiltonian is improperly normalized, e.g. H0 can be chosen arbitrarily However, in case scl is not (exactly) the classical solution, if Zβ[d, J] is calculated consistently with this choice. may this due to a truncation error of an iteration scheme The Helmholtz free energy Fβ[J] is also the genera- to solve for the classical field, or may scl be chosen for a tor of all connected correlation functions of the signal c n completely different purpose, Eq. 75 provides the correct hsx1 ··· sxn i(s|d) =−δ Fβ[d, J]/δJx1 ··· δJxn |β=1,J=0. It field theory for φ = s − scl independent of the nature of can be calculated as follows: s . In case of a truncation error, incorporating diagrams cl G ′ Z [J] † with data-source terms j into any computation will per- 1 β −β Hint[s] −β (HG [s]−J s) Fβ = − log G Dse e mit to correct the inaccuracy of scl in a systematic way. β Z [J] β Z !

1 G 1 −β Hint[s] = − log Zβ [J] − log e , (80) β β (s|J+j,G) D. Boltzmann-Shannon Information D E where the average in the last term is over the Gaus- sian probability function P G [s] = exp(−β (H [s] − 1. Helmholtz free energy J,β G † G J s))/Zβ [J]. This term can be calculated by using the well-known fact that the logarithm of the sum of all pos- Information fields carry information on distributed sible connected and unconnected diagrams with only in- physical quantities. The amount of signal-information ternal coordinates (or without free ends), as generated should be measurable in information units like bits and by the exponential function of the interaction terms, is bytes. This is possible by adopting the Boltzmann- given by the sum of all connected diagrams [57]. For Shannon information measure of negative entropy. The example, a free theory, perturbed by small, up-to-fourth- entropy of a signal probability function measures the order interaction terms (all being proportional to some phase-space volume available for signal uncertainties, and small parameter γ), has therefore the constraintness of the remaining uncertain- ties. Thus we define G (0) −1 Fβ[J] = H0 +Λ −β + + + H  Id ≡ Ds P (s|d) log P (s|d) 0 Z 1 | {z } = − Ds e−H[s] (H[s] + log Z) + + + + + + O(γ2), (81) Z  Z = −hH[s]id − log Z. (77)  where an information source vertex reads β (J +j −Λ(1)), as the information measure. Introducing an internal vertex with n lines β Λ(n), and the propagator β−1 D. Finally, we have defined † Zβ[d, J] = Ds exp −β (H[s] − J s) , and 1 −1 1 −1 Z = log |2πDβ | = Tr(log(2πDβ )). 1  2 2 Fβ[d, J] = − log Zβ[d, J], (78) β Thus, we have 1 1 F [J] = H − Tr(log(2πDβ−1)) + Λ(2)[D] β 0 2 β 2 β 8 We propose the term uncertainty-corrections in order to describe 1 (1) † (2) (1) the influence of the spread of the probability distribution func- + (J + j − Λ ) (D +Λ ) (J + j − Λ ) tion around its maximum. The uncertainty-corrections are the 2 1 1 information field theoretical equivalent to quantum-corrections + Λ(3)[D,m ]+ Λ(3)[m ,m ,m ] in quantum field theories. 2 β J 3! J J J 16

1 (4) 1 (4) the measurement fidelity or quality. The information in- + Λ [D,D]+ Λ [D,mJ ,mJ ] 8 β2 4 β creases linearly with Q as long as Q ≪ 1, but levels off 1 to a logarithmic increase for Q ≫ 1. + Λ(4)[m ,m ,m ,m ]+ O(γ2), (82) 4! J J J J We note, that for the free theory only the information gain does not depend on the actual data realization. where we introduced the zero-order map mJ = D (J + j) for notational convenience. The power of β associated with the different diagrams in Eq. 81 is given by the E. IFT Recipe number of vertices minus the number of propagators mi- nus one. Thus, all tree-diagrams are of order β0, the A typical IFT application will aim at calculating a one-loop diagrams are of order β−1 and the two loop di- model evidence P (d), the expectation value of a signal agram of order β−2, and only the latter two affect the given the data, the map m(x) = hs(x)i(s|d) of the sig- information content: 2 nal, or its variance σs (x, y) = h(s(x) − m(x)) (s(y) − ̺ m(y))i(s|d) as a measure of the signal uncertainty. The I = − + + + + + + O(γ2) d 2 general recipe for such applications can be summarized " # as following: 1 = −Tr(1 + log(2πD))+Λ(2)[D]+Λ(3)[D,m ] 2 0 • Specify the signal s and its prior probability distri- h bution P (s). If the signal is derived from a physical 1 (4) † 2 + Λ [D ⊗ (D + m0 m )] + O(γ ), (83) field ψ, of which a prior statistic is known, the dis- 2 0  tribution of s = s[ψ] is induced according to Eq. 9 2. where ̺ = Tr(1), β = 1, J = 0, and thus m0 = D j. • Specify the data model in terms of a likelihood 2. Free theory P (d|s) conditioned on s. Again, if the data are related to an underlying physical field ψ, the like- lihood is given by Eq. 4. To obtain the information content of the free theory, we can set γ = 0 in Eqs. 82 and 83 or use Eq. 49 with • Calculate the Hamiltonian Hd[s] = − log(P (d, s)), the replacements J → βJ, j → β j, D → β−1D, and where P (d, s) = P (d|s) P (s) is the joint probabil- H0 → βH0. In both cases we find identically ity, and expand it in a Taylor-Fr´echet series for all degrees of freedom of s. Identify the coefficients of 1 1 2 π th F [J]=HG − (J + j)†D (J + j) − Tr log D , and the constant, linear, quadratic, and n -order terms β 0 2 2β β   with the normalization H0, information source j, 1 inverse propagator D−1, and nth-order interaction I =− Tr (1 + log (2 πD)) . (84) d 2 term Λ(n), respectively, as shown in Eq. 36 or 59. Very similarly, one can calculate the information prior to • Draw all diagrams, which contribute to the quan- the data, which turns out to be tity of interest, consisting of vertices, lines, and open-ends up to some order in complexity or some 1 I0 = − Tr (1 + log (2 πS)) . (85) small ordering parameter. The log-evidence is 2 given by the sum of all connected diagrams without Thus, the data-induced information gain is open ends, the expectation value of the signal by all connected diagrams with one open end, and the 1 −1 signal-variance around this mean by all connected ∆Id = Id − I0 = Tr log SD 2 diagrams with two open ends. 1  = Tr log 1+ S R†N −1 R . (86) 2 • Read the diagrams as computational algorithms  specified by the Feynman rules in Sect. IV, and The information gain depends on the signal-response- implement them by using linear algebra packages † −1 to-noise ratio Q ≡ R S R N , also shortly denoted by or existing map-making codes for the information propagator and vertices. The required discretisa- tion is outlined in Sect. II E 1. Information on how to implement the required matrix inversions effi- 9 Here, we introduced the symmetrized tensor product A ⊗ B of an ciently can be found in the literature given in Secs. n-rank tensor A and an m-rank tensor B, which has the property I C 2, I C 4, and I C 5 and especially in [38]. 1 (A ⊗ B)x ...x = Ax ...x Bx ...x , 1 n+m m! π(1) π(n) π(n+1) π(n+m) • If the resulting non-linear data transformation (or π∈P Xn+m filter) has the required accuracy, e.g. to be verified with Pl being the set of permutations of {1,...,l}. via Monte-Carlo simulations using signal and data 17

realizations drawn from the prior and likelihood, parameter b are constant everywhere, whereas in reality respectively, an IFT algorithm is established. both exhibit spatial variations.10 Although being now spatially inhomogeneous, we assume κ and b to be known • In case that too large interaction terms in the for the moment and to incorporate all above observa- Hamiltonian prevent a finite number of diagrams to tional effects. form a well performing algorithm, a re-summation To cure the above mentioned shortcomings we replace of high order terms is due. This can be achieved by Eq. 87 by a non-linear and non-translational invariant the saddle point approximation (classical solution, model: maximum a posteriori estimator), or even better by a detailed renormalization-flow analysis along the µi = κ(xi) exp(b(xi) s(xi)), (88) lines outlined in Sect. V F. where κ and b may depend on position in a known way, and the unknown Gaussian field s, the log-matter density, V. COSMIC LARGE-SCALE STRUCTURE VIA may exhibit unrestricted negative fluctuations. Note that GALAXY SURVEYS µ is the signal response, by our definition in Eq. 10, since µ[s] = hdi(d|s). We call κ the zero-response, since A. Poissonian data model and Hamiltonian µ[0] = κ. It should be stressed that the data model in Eq. 88 is just a convenient choice for illustration and proof-of- Many datasets suffer from Poisson noise, which is non- concept purposes, and is easily exchangeable with more Gaussian and signal dependent, and therefore well suited realistic, and even non-local data models. However, this to test IFT in the non-linear regime. For example, the log-normal data model was originally proposed by Coles cosmological LSS is traced by galaxies, which may be and Jones [212], investigated for constrained realizations assumed to be generated by a Poisson process. On large- by Sheth [107] and Vio et al. [213] and seems to reproduce scales, the expectation value of the galaxy density fol- the statistics of LSS simulations much better than the lows that of the underlying (dark) matter distribution. often used of the overdensity [214]. The aim of cosmography is to recover the initial den- Having chosen a Poissonian process to populate the sity field from the shot-noise contaminated galaxy data. Universe and our observational data with galaxies ac- Currently, large galaxy surveys are conducted in order cording to the underlying log-density field s, the likeli- to chart the cosmic matter distribution in three dimen- hood is sions. Improving the galaxy based LSS reconstruction µdi techniques and understanding their uncertainties better P (d|s) = i e−µi (89) is therefore an imminent and important goal. Optimal d ! i i techniques to reconstruct Poissonian-noise affected sig- Y nals are also crucial for other problems, since e.g. imag- = exp [di log µi − µi − log(di!)] , ing with photon detectors plays an important role in as- ( i ) tronomy and other fields. Here, we outline how such X problems can be treated, by discussing a specific data where di is the actual number of galaxies observed in cell model motivated by the problem of large-scale-structure i. Since P (s)= G(s,S), the Hamiltonian is given by reconstruction from galaxies. For this problem we work out the optimal estimator and show its superiority nu- Hd[s] = − log P (d, s)= − log P (d|s) − log P (s) merically. A more general discussions of models of galaxy 1 = −d†bs + κ† exp(bs)+ H′ + s†S−1s and structure formation and references to relevant works 0 2 was given in Sect. I C 4. ∞ 1 1 In order to treat the Poissonian case in a convenient = s†D−1s − j†s + H + λ† sn, with 2 0 n! n fashion, we subdivide the physical space into small cells n=3 with volumes ∆V , and assume that a cell located at x X i D−1 = S−1 + κb2, (90) has an expected number of observed galaxies d µi ≈ κ (1 + bs(xi)) (87)

10 Such variations are due to the geometry of the observational with κ =n ¯g ∆V being the cosmic average number of survey sky coverage, due to a galaxy selection function which galaxies per cell and b being the bias of the galaxy over- decreases with distance from the observer, and due to a chang- density with respect to the dark matter overdensity s, ing composition of the galaxy population. The latter distance- still assumed to be a Gaussian random field (Eq. 38). effects are caused by the cosmic evolution of galaxies and by the However, this data model has two shortcomings. First, changing observational detectability of the different types with too negative fluctuations of the Gaussian random field distance. We note, that an observed sample of galaxies, which was selected deterministically or stochastically from a complete with s < −1 lead to negative expectation values, for sample e.g. by their luminosity due to instrumental sensitivity, which the Poissonian statistics is not defined. Second, still possesses a Poissonian statistics, if the original distribution the mean density of observable galaxies κ and their bias does. 18

j = b (d − κ), type’s spatial distributions via a L-dependent bias bL, 1 and their detectability as encoded in κL. The data space H = log(|2πS|) + (κ + log(d!))†1 − d† log κ, 0 2 is now spanned by Ωdata =Ωspace × Ωtype, and also µ, κ and and b can be regarded as functions over this space. Performing the same algebra as in the previous section, λ = κbn. n just taking the larger data-space into account, we get The hat on a scalar field denotes that it should be read to exactly the same Hamiltonian, as in Eq. 90, if we as a matrix, which is diagonal in position space (see Ap- interpret any term containing d, κ and b to be summed pendix A). A few remarks should be in order. Compar- or integrated over the type variable L. Thus, we read ing the propagator to the one of our Gaussian theory one † −1 can read off an inverse noise term M = R N R = κb2. j(x)=(b (d − κ)) (x) ≡ dL bL(x) (dL(x) − κL(x)), Thus the effective (inversely response weighted) noise de- Z creases with increasing mean galaxy number and bias, −1 −1 2 −1 2 d Dxy = S + κb ≡ Sxy +1xy dL κL(x)bL(x), xy and seems to be infinite in regions without data (κ = 0) Z without causing any problem for the formalism.   λ (x)=(κbn) (x)d≡ dL κ (x) bn (x), and (91) The information source j increases with increasing re- n L L sponse (bias) of the data (galaxies) to the signal (density Z b s bL(x) s(x) fluctuations). However, it certainly vanishes for zero re- µ[s](x)= κe (x) ≡ dL κL(x) e = dL µL[s](x), Z Z sponse (b = 0) or in case that the observed galaxy counts  match the expected mean at a given location exactly. Fi- which all live in Ωspace solely, so that the computa- nally, the interaction terms λn are local in position space, tional complexity of the matter distribution reconstruc- and vanish with decreasing b and κ. The latter param- tion problem is not affected at all, and only a bit more eter is under the control of the data analyst, since it is book-keeping is required in its setup. proportional to the volume of the individual pixel sizes, A few observations should be in order. In case of all and therefore can be made arbitrarily small by choosing galaxies having the same bias factor, Eq. 91 is simply a more fine grained resolution in signal space. However, a marginalization of the type variable L, and any dif- this would not change the convergence properties of the ferentiation of the various galaxy types is not necessary. series since any interaction vertex has then to be summed Since all known galaxy types seem to have b ∼ O(1), over a correspondingly larger number of pixels within a such a marginalization seems to be justified, and ex- coherence patch of the signal, which exactly compensates plains why LSS reconstructions, which applied this sim- 11 for the smaller coefficient. The bias, in contrast, is plification, are relatively successful, although the differ- set by nature and can be regarded as a power counting ent galaxy masses, luminosities, and frequencies vary by parameter, which provides naturally a numerical hierar- orders of magnitude. As our numerical experiments be- chy among the higher order vertices and diagrams for low reveal, the data, and therefore the reconstructability 2 b S < 1. Note that j = O(b). of the density field, exhibit a sensitive dependence on the bias for s-fluctuations with unity variance.12 Such a variance is indeed observed on scales below ∼ 10 Mpc in B. Galaxy types and bias variations the galaxy distribution, and therefore the galaxy type- dependent bias variation does indeed matter. Larger Real galaxies can be cast into different classes, which galaxies, which have larger biases, therefore provide per all differ in terms of their luminosities, bias factors, and galaxy a slightly larger information source (j ∝ b), the frequencies with which they are found in the Uni- less shot noise (R†N −1R ∝ b2), and increasingly larger n verse. Although we are not going to investigate this higher-order interaction terms (λn ∝ b ) in comparison complication in the following, it should be explained here to smaller galaxies. However, smaller galaxies are much how all the formulae in this section can easily be reinter- more numerous by orders of magnitude, and therefore preted, in order to incorporate also the different classes provide the largest total contribution to the information of galaxies. source, noise reduction and most low-order interaction The galaxies can be characterized by a type-variable terms. Thus, the latter will dominate and therefore per- L ∈ Ωtype, which may be the intrinsic luminosity, the mit a reasonable accurate matter reconstruction from an morphological galaxy type, or a multi-dimensional com- inhomogeneous galaxy survey using a single bias value. bination of all properties which determine the galaxy Nevertheless, improvements of the bias treatment are possible by applying the recipes described here.

11 κ seems to control the stiffness of the later introduced response renormalization flow equation and its values is therefore numeri- 12 This is found for our specific data model µ ∝ exp(bs), however, cally relevant. A lower κ, due to a finer space pixelisation, results should also apply for other models, which somehow have to keep in a less stiff and better behaved equation. µ ≥ 0 even for bs< −1 19

b scl C. Non-linear map making = Db d − κ e − bscl (94) b s = Sb (d − κe cl ).  The map, the expectation of our information field s κs given the data, is to the lowest order in interaction cl The last expression motivates| {z to} introduce the expected 6 m1 = + + + + O(b ) number of galaxies given the signal s: 1 1 = D j − D b3 κ D − D b3 κ (D j )2 κ = κeb s. (95) xy y 2 xy y y yy 2 xy y y yz z s 1 4 6 Also alternative forms of the MAP equation can be de- − Dxy b κy Dyy Dyz jz + O(b ) (92) 2 y rived, for example one, which is especially suitable for or in compact notation large j: −1 −1 1 3 2 6 1 j − S scl 1 d S scl m1 = m0 − D b κ D + m + Dbm0 + O(b ). (93) s = log = log − 1 − . 2 0 cl b κb b κ κb      (96) It is apparent, thatd theb non-linearcb map making formula This may be solved iteratively, while ensuring that s(i) ≤ contains corrections to the linear map m0 = D j. The cl first two correction terms are always negative, reflect- S j at all iterations i with equality only where κ = 0. This ing the fact that our non-linear data model has non- form of the classical field equation has some similarities symmetric fluctuations in the data with respect to the to the naive inversion of the response formula, hdi(d|s) = mean. The last correction term is oppositely directed to κ exp(bs), which yields the linear map, thereby correcting for the curvature in 1 d the signal response. s = log , (97) naive b κ A one-dimensional, numerical example is displayed   in Fig. 1. There, the signal was realized to have a 2 2 −1 a formula one can only dare to use in regimes of large d. power spectrum Ps(k) ∝ (k + q ) , with a correlation Since s contains the full noise of the data, a suitable length q−1 = 0.04. The normalization was chosen such naive naive map may be given by mnaive = Ssnaive, after some that the auto-correlation function is hs(x) s(x + r)i(s) = fix for the locations without galaxy counts. The clas- exp(−|q r|) and therefore the signal dispersion is unity, 2 sical solution, however, is more conservative than this hs i(s) = 1. The data are generated by a Poissonian naive data inversion, in that there is a damping term, process from κs = κ exp(bs) with b =0.5. All three dis- −1 S scl/(κb), compensating a bit the influence of too played reconstructions exhibit less power than the orig- large data points. inal signal, as it is expected since the reconstruction is Those equations permit to calculate the classical solu- conservative, and therefore biased towards zero. tion if suitable numerical regularization schemes are ap- The non-linear correction to the naive map m0 should plied, since naive iterations can easily lead to numerical not be too large, otherwise higher order diagrams have to divergences in the non-linear case. be included. In the case displayed in Fig. 1, b =0.5 en- One way of doing this is by turning the classical equa- sured that the linear corrections were mostly going into tion (Eq. 94) into a dynamical system. Its initial con- the right direction. However, in case b ≈ 1 there is no ditions are given by a well solvable linear or even triv- obvious ordering of the importance of the different inter- ial problem to which non-linear complications are added action vertices, and numerical experiments reveal that successively during an interval of some pseudo-time. The the first order corrections strongly overcorrect the linear endpoint of this dynamics is then the required solution. map m0 = D j. In such a case interaction re-summation The meaning of the pseudo-time depends on the way it techniques should be used to incorporate as many higher was set up. In any case, it can just be regarded as a math- order interaction terms as possible. One very powerful ematical trick to generate a differential equation, which re-summation is provided by the classical solution, as de- might be easier to solve numerically than the original veloped below, which contains all tree-diagrams simulta- problem. neously. This solution, also show in Fig. 1, is very close For example, a pseudo-time τ can be introduced by to m1 in this case. setting j(τ) = τ j. Thus, the information source is successively injected into an initially trivial field state, D. Classical solution scl(0) = 0. This allows to set up a differential equation for scl(τ) by taking the time derivative of Eq. 94,

The classical signal field or MAP solution is given by −1 2 −1 s˙cl = Ds j with Ds = S + κs b , (98) Eq. 74, which reads in this case cl cl cl ∞ which has to be solved for scl(1) starting from scl(0) = bm+1 s = D j − κsm 0. This equation is very appealing, since it looks like cl m! cl m=2 ! Wiener-filtering an incoming information stream j and X 20

data d signal response µ 10 zero response κ

1 4 0.2 signal s 0.6 0.8 Wiener map m0 3 correted map m1 classical map scl 2

PSfrag replacements 1

0

-1 ±Db1/2 m 1/2 1.5 0.2 b 0.6 0.8 ±D0 ∆m = s − m 1 map mT=1 0.5 ∆scl = s − scl 0 map mT=0.5 -0.5

-1

mask -1.5 0 0.2 0.4 0.6 0.8 1

FIG. 1: Poissonian-reconstruction of a signal with unit variance and correlation length q−1 = 0.04, observed with slightly non-linear response (b = 0.5, resolution: 513 pixels per unit length, zero-signal galaxy density: 1000 galaxies per unit length). Top: data d, signal response µ, and zero-response κ. Middle: signal s, linear Wiener-filter reconstruction m0 = D j, its 1/2 one-sigma error interval m0 ± D , next order reconstruction m1 according to Eq. 92, and classical solution scl according to Eq. 94. Although the linear Wiener is reconstructing well at most locations, the nonlinear response requires the perturbative corrections present in m1 or the classical solution in regions of high signal strength. Bottom: The residuals, the deviations of b 1/2 m0, m1, scl from the signal, and the Wiener-variance ±D .

b accumulating the filtered data, while simultaneously tun- imation t ≈ scl to the classical solution can be achieved.

ing the filter Dscl(τ) to the accumulated knowledge on Figs. 1 and 2 display classical solutions for slightly and the signal scl(τ) and thereby implied Poissonian-noise strongly non-linear Poissonian inference problems. Espe- structure. Thus, it is a nice example system for contin- cially the second example shows that the classical solu- uous Bayesian learning and also illustrates how different tion can be improved in regions of large uncertainty (see datasets can successively be fused into a single knowledge region between x = 0.2 and 0.5 in Fig. 2, where ap- basis. parently better estimators exist) for missing uncertainty loop diagrams, which contain information about the non- Map-making algorithms with a higher fidelityare pos- Gaussian structure of the posterior P (s|d) away from scl. sible by not only investigating the maximum of the pos- terior, but by averaging the signal s over the full support of P (s|d). Anyhow, we can assume that a good approx- 21

100 data d signal response µ κ 10 zero response

1

0.1

0.01

0.2 signal s 0.6 0.8 2 map mT=0 map mT=0.5 1 map mT=1

0

-1 PSfrag replacements -2

1 0.4 0.6 0.8 b1/2 ±D0 ±b1/2 0.5 Dm

∆m = s − m 0

∆scl = s − scl -0.5

mask -1 0 0.2 0.4 0.6 0.8 1

FIG. 2: Poissonian-reconstruction of the same signal realization as in Fig. 1 (unit variance and correlation length q−1 = 0.04), observed now with a strongly non-linear response (b = 2.5, resolution: 512 pixels per unit length, zero-signal galaxy density: 100 galaxies per unit length where mask is one) through a complicated mask. Top: data d, signal response µ, and zero-response κ. Middle: signal s, classical solution scl = mT =0, intermediate solution mT =0.5 and renormalization-based reconstruction 1/2 mT =1 with uncertainty interval mT =1 ± DT =1, and mask κ/(ng ∆V ). The linear Wiener-filter reconstruction m0 as well as its next order corrected version m1 are not displayed, since they are partly far outside the displayed area. Bottom: Deviations 1/2 1/2 of the three reconstructions from the signal,b and the original and the renormalized uncertainty estimates ±D0 and ±DT =1, respectively. Note, that in the regions with many observed galaxies, the high signal to noise ratio can be seen in the narrowness 1/2 1/2 of DT =1, which is significantly smaller than the data-unaffected D0 at these locations. b b

b b −1 E. Uncertainty-loop corrections jt = b (d − κt) − S t, (99) ∞ 1 xm g(x) = ex − 1 − x − x2 = , Now, we see how the missing uncertainty loop cor- 2 m! m=3 rections can be added to the classical solution. These X corrections can be derived from the Hamiltonian of the and H is a momentarily irrelevant normalization con- uncertainty-field φ = s − t, 0,t stant. Again, we have permitted for a non-zero jt, since 1 t might not be exactly the classical solution. H [φ] = φ†D−1φ − j†φ + κ†g(b φ)+ H , where t 2 t t t 0,t It is interesting to note that the interaction coefficients −1 −1 2 (m) m Dt = S + b κt, in this Hamiltonian, λt = κt b , all reflect the expected

b 22 number of galaxies given the reference field t. Thus, the instead of its logarithm, s. Here c fixes the relation be- replacement κ0 → κt would provide us with the shifted tween s and ̺, and ̺0 being the cosmic median dark field Hamiltonian, as defined in Eq. 60, except for the matter density. Translating our log density map into the −1 term −S t in jt. It turns out, that this term is some density results in the naive density estimator sort of counter-term, which accumulates the effect of the naive cm non-linear interactions. m̺ = ̺0 e , (106) We see that effective interaction terms arise when rele- vant parts of the solution are absorbed in the background which is not optimal in the sense of minimal rms devia- field t. A similar approach is desirable for the loop di- tions. The proper estimator would be agrams. Instead of drawing and calculating all possible c s cm+c2 D/2 loop diagrams, we want to absorb several of them simul- m̺ = h̺0 e i(s|d) = ̺0 e , (107) taneously into effective coefficients. For each vertex of the Poissonian Hamiltonian with m legs, there exist dia- which contains uncertainty loop correctionsb accounting grams in any Feynman-expansion, in which a number of for the shift of the mean under the non-linear transfor- n simple loops are added to this vertex. Such an n-loop mation between log-density and density. enhanced m−vertex is given by

F. Response renormalization −1 n −1 n = λ(m+2n) D = κ bm+2n D . (100) 2n n! t t 2n n! t t Since we are dealing with a φ∞-field theory, the zoo All these diagrams canc be re-summed into anc effective of loop diagrams is quite complex, and forms something interaction vertex, via like a Feynman foam. In order not to get stuck in the ∞ multitude of this foam, we urgently require a trick to keep 1 λ(m) → λ′(m) = κ bm b2n Dn either the maximal order of the diagrams low, or to limit t t t 2n n! n=0 the number of vertices per diagram, or both. We have X basically two handles on any interaction term λ = κbn, b2 b n = κ exp D bm (101) the bias b and the zero-response κ. We concentrate on t 2   the response, since it enters the Hamiltonian in a linear m (m) = κ b2 b =b λ 2 . way and also the data can be regarded to be proportional t+ D t+ b D 2 2 to κ. Thus, the full Hamiltonian Thus, this re-summation is effectively equivalent to the b 1 replacement b H[s]= s†S−1s − b d†s + κ eb s (108) 2 0 κ → κ , (102) t t+b D/2 can be regarded to be proportional to the response, ex- cept for the prior term and also constant terms we im- which reflects the larger expected response to a refer- mediately drop here and in the following. ence field t due to the uncertaintyb fluctuations around it. Those fluctuations pick up the asymmetric shape of the Let us assume that prior to any data analysis we have exponential term in the Hamiltonian, where the larger re- an initial guess m0 for the signal with some Gaussian sponse to positive fluctuations is not fully compensated uncertainty characterized by the covariance D0. This by the lower response to negative fluctuations. One might can be expressed via a Hamiltonian of the form wonder, if the simple replacement rule in Eq. 102 could 1 † −1 supplement the classical solution with the missing un- H0[s]= (s − m0) D (s − m0), (109) 2 0 certainty loop corrections. Thus we ask, if the modified classical equation which defines a probability density via P0(s) ∝ exp(−H [s]). In case the prior should be our initial guess, m = bS (d − κ ) (103) 0 m+b D/2 we have m0 = 0 and D0 = S, but we need not restrict ourself to this case. Now, we want to anticipate step by together with a self-constitently determined propagator b step the information of the full problem, and forget our D−1 = S−1 + b2 κ (104) initial guess with the same rate. This can be modeled m+b D/2 by adopting an affine parameter τ, which measures how much we exposed ourself to the full problem. For each τ, could provide the mean field givenb the data. A more rigorous renormalization calculation willb show that this which we regard as a pseudo-time, our knowledge state is is indeed the case, within some approximation. described by an Hamiltonian Hτ . Increasing τ by some The loop-corrected density and propagator permit to small amount ε should therefore lead to the next knowl- construct estimators for the dark matter density itself, edge state characterized by

c s ̺ = ̺0 e , (105) Hτ+ε = Hτ [s]+ ε (H[s] − Hτ [s]) . (110) 23

−1 2 −1 FIG. 3: The original propagator D0 = (S + κ0 b ) (left) and the final of the renormalization flow D (Eq. 117, right) in logarithmic grey scaling for the data displayed in Fig. 2. The values of the diagonals show the local uncertainty variance (in Gaussian approximation) before (D0) and afterd (D) the data is analyzed, respectively. The bottom left and top right corners exhibit non-vanishing propagator values due to the assumed periodic spatial coordinate, which puts these corners close to the two others on the matrix diagonal.c b

∞ This equation just models an asymptotical approach to 1 1 = φ†D−1φ + ε λ φn, (112) the correct Hamiltonian. If the initial guess was the prior, 2 τ n! n n=1 one sees that for infinitesimal steps ε the knowledge flow X corresponds to tuning up all terms proportional to κ, if expressed for the momentarily uncertainty field φ = s − mτ . Here, the perturbative expansion coefficients are 1 † −1 −τ † b s given by Hτ [s]= s S s + 1 − e −b d s + κ0 e → H[s]. 2 −1 λ1 = κmτ b + S mτ − b d, This motivates the term response renormalization for this λ = κ b2 − M , and kind of continuous learning system, into which the infor- 2 mτ τ n mation source as well the interactions are fed with the λn = κmτ b for n> 2, same rate. c assuming for simplicity that Mτ is diagonal. This is a The trick for the renormalization procedure is to ap- save restriction, since we will see that for τ → ∞ this is proximate the knowledge state at each moment τ to be the case asymptotically, even for a non-diagonal initial of Gaussian shape and therefore the Hamiltonian to be M0. Thus we can require that our initial guess was also free (quadratic in the signal). Thus we set of this form. In order to approximate this Hamiltonian by a free 1 † −1 Hτ [s]= (s − mτ ) D (s − mτ ), (111) one, we have to calculate the shifted mean field and its 2 τ connected two-point correlation function, the full prop- −1 −1 where mτ and Dτ = (S + Mτ ) are the mean and agator. To first order in ε only leaf diagrams with a dispersion of the field given the acquired knowledge at single perturbative interaction vertex contribute to the time τ, respectively. perturbed expectation value of φ: These have to be updated when the next learning step is to be performed. The next Hamiltonian, before it being (τ+ε) again replaced by a free one, is hφi(s|d) = + + + + ... −1 b2 D /2 1 † −1 =εD b d − S m − bκ e τ . (113) H [φ] = φ D φ τ τ mτ τ+ε 2 τ Note, that only oddh interaction terms shift thei expecta- −1 † 1 † bφ b + ε (S mτ − b d) φ − φ Mτ φ + κm e (τ+ε) 2 τ tion value mτ+ε = mτ + hφi . The even ones do not   (s|d) 24 exert any net forces in the vicinity of φτ = 0 since they If neither T = 0 nor T = 1 provide the optimal recon- represent a potential which is mirror symmetric about struction, what would be the right choice? We have to re- this point. member that we replaced the probability density function The renormalized propagator Dτ+ε is given by the at each step of the renormalisation scheme by a Gaussian † (τ+ε) with the correct mean and dispersion. However, the real connected two-point correlation function hφφ i(s|d) , and this is up to linear order in ε probability is not a Gaussian, and therefore our mean field estimator is not optimal. Reconstructions with dif- ferent T probe the non-Gaussian probability structure with a differently wide Gaussian kernel in phase space, hφφ†i(τ+ε) = + + + + ... (s|d) and therefore result in a slightly different signal means 2 2 b Dτ /2 due to the anharmonic nature of our Hamiltonian. = Dτ + εDτ M − b κmτ e Dτ (114)   Rewriting this for an update of Mτ we findb up to linear order in ε G. Uncertainty structure

2 2 b Dτ /2 Mτ+ε = (1 − ε) Mτ − εb κmτ e . (115) The remaining uncertainties at the end of the renor- malization flow can mainly be read of the renormalized Taking the limit ε → 0 yields the integro-differential sys- b propagator D, which we display in top part of Fig. 3 tem in comparison to the original, un-renormalized one D0. bm+b2 D/2 −1 The renormalised propagator is a much better approxi- m˙ = D b d − bκ0 e − S m mation to the uncertainty-dispersion of the signal poste- 2  bm+b2 D/2  rior distribution around the mean map than the original M˙ = b κ0 e − M,band (116) −1 one. One can clearly see that the data imprinted a highly −1 D = S + M b . non-uniform structure into the uncertainty pattern visi- ble in the renormalized propagator with small uncertain-   This converges at a fix point,c which we previously guessed ties where there were many galaxy counts. Also the den- in Eqs. 103 and 104 for our uncertainty-loop enhanced sity estimator in Eq. 107 benefits from the knowledge of classical equation. the uncertainty structure contained in the renormalised The classical and the renormalization flow fix point propagator, as the lower panel of Fig. 4 shows. equations can be unified: The propagators also visualize the effect any additional data would have at different locations. The height and m = bS d − κ , width of the propagator values define respectively the bm+T b D/2 strength of the response to, and the distance of informa-  −1 −1 tion propagation from an information source. D = S + κbm+T b D/2 , (117) b The structure of D is imprinted by the prior and the   0 with T = 0 and T = 1 for theb classical and renormaliza- mask. At D0’s widest locations the mask blocks any in- tion result, respectively. b formation source and the structure of the signal prior S The parameter T is more than a pure convenience. If becomes visible. At locations where the mask is transpar- we would have introduced a temperature T at the be- ent, the reconstruction response per information source is ginning, via P (d, s|T ) = exp(−Hd[s]/T ), Eq. 117 would lower, as plenty information can be expected there. Also have been the result of the renormalization flow calcula- the propagator width is smaller, since the individual in- tion. And the classical limit naturally corresponds to the formations do not need to be propagated that far, thanks zero temperature regime, in which the field expectation to the richer information source density in such regions. value is not affected by any uncertainty fluctuations since The structure of Dm has additionally imprinted the ex- the system is at its absolute energy minimum. pected information source density structure given the re- An example of such reconstructions can be seen in Fig. construction m. The strongly non-linear signal response 2, and its uncertainty structures in Fig. 3. Here, the has lead to regions with very high galaxy count rates, renormalization equation indeed seems to provide a bet- which have larger information densities, and therefore ter result compared to the classical one. However, a sta- lower and narrower information propagators. This im- tistical comparison of the two reconstructions using 1000 plies, that any additional galaxy detection in the regions realization of the signal and data in Fig. 4 shows that with high galaxy counts will have little impact on the there is at most a marginal difference. This may be sur- updated map, whereas any additional detected galaxies prising, since the classical and renormalization solution in low density regions will more strongly change it. How- are quite distinct, and the latter is always lower than the ever, the number of additional galaxies per invested ob- former. One might therefore ask, if the two are bracket- serving time will be larger in high density regions, which ing the correct solution. And indeed, intermediate solu- may compensate the lower information-per-galaxy ratio tions constructed using T =1/2 perform better than the there. It is therefore interesting to look at the obser- ones for T = 0 and T = 1, as can be seen in Fig. 4. vational information content and how it depends on the 25 δ 1 mnaive δb D0 b δD = 0.8 T 1 δmT=0

δmT=1 0.6 δmT=0.5

0.4

0.2

PSfrag replacements 0.2 0.4 0.6 0.8 2.5 δ very naive m̺ δ naive m̺ 2 δm̺

1.5

1

0.5

0 κ 0 0.2 0.4 0.6 0.8 1

FIG. 4: Top: Statistical reconstruction error from 1000 signal and data realizations Curves are, roughly in order from top 2 1/2 (bad performance) to bottom (good performance): error δmnaive = h(s − mnaive) i(d,s) of the signal-covariance-convolved naive 1/2 map mnaive = Ssnaive (see Eq. 97), expected Wiener-uncertainty δ = D0 , averaged renormalized uncertainty δ = D0 DT =1 1/2 2 1/2 2 1/2 hDT =1i(d,s), error of the classical map δmT =0 = h(s − mT =0) i(d,s), error of the renormalized map δmT =1 = h(s − mT =1) i(d,s), 2 1/2 b and error of the intermediate map δmT =0.5 = h(s − mT =0.5) i(d,s).b The lowest curve without label is κ. Bottoms:b Error b s mnaive 2 1/2 naive 2 1/2 variance of estimators for the density, ̺ = e , namely δ very naive = h(̺ − e ) i , δ naive = h(̺ − m̺ ) i and m̺ (d,s) m̺ (d,s) 2 1/2 δm̺ = h(̺ − m̺) i(d,s) (see Eqs. 106 and 107).

actual data realization. The information gain, as given by Eq. 83, expanded to the first few orders in b 1 H. Information gain ∆I = Tr log 1+ S κb2 (118) 1 2   ∆I0 d In case of a free theory, the amount of information de- † |1 3 {z }1 2 5 pends on the experimental setup and on the prior, but + κb D0 m0 + b (D0 + m0) + O(b ), is independent of the data obtained as we have shown in 2 2     Sect. IV D 2. This changes in case that one wants to har- c b vest information in a situation described by a non-linear clearly depends on the actual realization of the data. The IFT. There, the amount of information can strongly de- different fluctuations in the Wiener map m0 = D0 j, with −1 2 −1 pend on the actual data. D0 = (S + b κ) and j = b (d − κ), imply positive This is well illustrated by our LSS reconstruction prob- and negative information density fluctuations. lem. A perturbative calculation of the non-linear infor- To convenientlyd calculate the information gain of the mation gain is possible if either the bias-factor or the observation in case of a large bias factor, we use the Gaus- signal amplitude, which both control the strength of the sian approximation of the jointed probability function, non-linear interactions, are small compared to unity.13 as provided by the renormalization scheme. Due to the Gaussianity of this approximate solution, we can simply

13 The signal amplitude can, for example, be made small by defining 2 the signal of interest to be the cosmic density field, smoothed on a sufficiently large scale (> 10 Mpc) so that hs i(s) < 1. 26

information gain ∆Id th 0 order approx. ∆I0 1 st 1 order approx. ∆I1

0.1

0.01 0.2 0.4 0.6 0.8

1

0.1 PSfrag replacements

0.01 0 0.2 0.4 0.6 0.8 1

FIG. 5: Information gain density (the integrands of Eq. 118 and 119) for the two reconstruction examples presented, the only weakly nonlinear one (top, and Fig. 1) and the strongly non-linear one (bottom, and Fig. 2). The renormalization result for T = 1 (Eq. 119), the zero- and first-order perturbative results (Eq. 118) are shown. The information gain depends on the observational sensitivity as well as the actual data. The latter influence is stronger in the non-linear regime, and disappears in linear inference problems.

use the formula for the information gain of a free theory, with respect to κ(x). Using Eqs. 117 and 88 we find as given by Eq. 86. This yields δI 1 1 −1 h d i = b2 η 1+ κ b2ηD2b2 D. δκ (new data|d) 2 2 0 1 0   ∆I = Tr log 1+ S b2κη , (119) (120) d d b 2 The expected information gain is especially large for    d observations at locations where the uncertainty D is bm+ 1 b2D large, where a large number density of galaxies (∝ η) with η = e 2 T =1 being proportional to the ex- can be expected, and where strong non-linearities are pected number density of glaxies in this region (see Eq. b present(∝ b2). The inverse term caps the maximally 107). It is also hereb obvious that the information gain available information gain at some level. For the two depends on the data. In regions with higher observed reconstruction examples given in Figs. 1 and 2 we dis- galaxy numbers η is larger, and more information is ex- play the expected information gain as a function of the pected to be harvested by further observations. This is observing postion in Fig. 6. illustrated in Fig. 5, where the information gain density, It is apparent from the top panel, showing the case of the individual contributions to the trace in Eq. 119, as uniform observation coverage, that additional observa- well as the first and and all terms of Eq. 118 are shown tions are more advantageous at locations where already for the cases displayed in Figs. 1 and 2. The approxi- an increased matter density is identified. The bottom mate Eq. 118 seems to be adequate for b ≪ 1, but not panel, showing the case of an very inhomogeneous ob- for our cases of b =0.5 and 2.5. servation of strongly nonlinear data, demonstrates that The expected benefit of additional observations at lo- filling observational gaps should have the highest priority. cation x can also be calculated by differentiating Eq. 119 But there again, regions where the extrapolated galaxy 27

100 data d zero response κ expected information gain hδId/δκi 10 PSfrag replacements 1 information ∆Id 0.1 th 0 order approx. ∆I0 st ∆ 1 order approx. I1 0.01

0.2 0.4 0.6 0.8 signal response µ 100

b1/2 ±Dm 10 b1/2 ±D0 ∆m = s − m 1 map m ∆ = scl s − scl 0.1 classical field scl

mask 0.01 signal s 0 0.2 0.4 0.6 0.8 1 FIG. 6: Differential information gain density for the two reconstruction examples presented, the only weakly nonlinear one (top, and Fig. 1) and the strongly non-linear one (bottom, and Fig. 2).

density seems to be larger should be preferred, as can be temperature fluctuations. This problem has currently a seen from the asymmetric shape of the expected infor- high scientific relevance due to the strongly increasing mation gain for observations in the gap around x =0.2. availability of high fidelity CMB measurements, which In this example, the information-harvest of high galaxy permit to constrain the physical conditions at very early density regions can be so large, that further observations epochs of the Universe. The relevant references for this of the already well observed regions at the boundary of topic were provided in Sect. I C 5. the domain seems to be more advantageous than improv- ing the poorly observed regions around x =0.4, where a On top of the very uniform CMB sky with a mean low galaxy density is already aparent from the existing temperature TCMB, small temperature fluctuations on {I,E,B} −{5,6,7} data. the level of δTobs /TCMB ∼ 10 are observed Of course, in order to plan observations in a real case, or expected in total Intensity (Stokes I) and in polariza- the dependence of observational costs as a function of lo- tion E- and B-modes, respectively. The weak B-modes cation x and already achieved zero-response there, κ(x), are mainly due to lensing of E-modes and some un- have to be folded into the considerations. known level of gravity waves. We will disregard them in the following. These CMB temperature fluctuations are believed and observed to follow mostly a Gaussian distribution. However, inflation predicts some level of VI. NON-GAUSSIAN CMB FLUCTUATIONS VIA f -THEORY non-Gaussianity. Some of the secondary anisotropies nl imprinted by the LSS of the Universe via CMB lens- ing, the Integrated Sachs-Wolfe and the Rees-Sciama ef- A. Data model fects should also have imprinted non-Gaussian signatures [215, 216]. The primordial, as well as some of the sec- As an IFT example on the sphere Ω = S2, involving ondary CMB temperature fluctuations are a response to two interacting uncertainty fields, we investigate the so the gravitational potential initially seeded during infla- called fnl-theory of local non-Gaussianities in the CMB tion. Since we are interested in primordial fluctuations, 28

we write B. Spectrum, bispectrum, and trispectrum {I,E} d ≡ δTobs /TCMB = R ϕ + n, (121) The nonlinearity of the relation between the hidden where ϕ is the 3-dimensional, primordial gravitational Gaussian random field φ and the observable gravitational potential, and R is the response on it of a CMB- potential ϕ (Eq. 122) imprints non-Gaussianity into the instrument, observing the induced CMB temperature latter. In order to be able to extract the value of the fluctuations in intensity and E-mode polarization. These non-Gaussianity parameter f from any data containing are imprinted by a number of effects, like gravitational information on ϕ, we need to know its statistic at least redshifting, the Doppler effect, and anisotropic Thom- up to the four-point function, the trispectrum, which we son scattering. In case that the data of the instrument briefly derive with IFT methods. are foreground-cleaned and deconvolved all-sky maps (as- To that end, it is convenient to define a ϕ-moment suming the data processing to be part of the instrument) generating function Z[J] and its logarithm the response, which translates the 3-d gravitational field into temperature maps, is well known from CMB-theory † log Z[J] = log Dφ P (φ) eJ ϕ(φ) (124) and can be calculated with publicly available codes like Z cmbfast, camb, and cmbeasy (see Sect. IC5). The 1 = J †(Φ−1 − 2 fJ)−1J − (fJ)†Φ precise form of the response does not matter for a devel- 2 opment of the basic concept, and can be inserted later. 1 Finally, the noise n subsumes all deviation of the mea- − Tr log 1 −c2Φ fJ b 2 surement from the signal response due to instrumental h  i and physical effects, which are not linearly correlated This permits to calculate via J-derivativesc (see Eqs. 32- with the primordial gravitational potential, such are de- 35) the mean tector noise, remnants of foreground signals, but also primordial gravitational wave contributions to the CMB ϕ¯ = hϕi(φ) =0, (125) fluctuations. the spectrum (or covariance) The small level of non-Gaussianity expected in the (ϕ) c CMB temperature fluctuations is a consequence of some Cxy = hϕx ϕyi(φ) = h(ϕ − ϕ¯)x (ϕ − ϕ¯)yi(φ) non-Gaussianity in the primordial gravitational poten- 2 tial. Despite the lack of a generic non-Gaussian probabil- = Φxy +2 fxΦxyfy, (126) ity function, many of the inflationary non-Gaussianities the bispectrum15 seem to be well described by a local process, which taints (ϕ) c (an initially Gaussian random field, φ ←֓ P (φ)= G(φ, Φ) Bxyz = h(ϕ − ϕ¯)x (ϕ − ϕ¯)y (ϕ − ϕ¯)zi(φ) = hϕx ϕy ϕzi(φ (with the φ-covariance Φ = hφ φ†i ), with some level of (φ) = 2 [Φ f Φ +Φ f Φ +Φ f Φ ] non-Gaussianity. A well controllable realization of such a xy y yz yz z zx zx x xy tarnishing operation is provided by a slightly non-linear +8ΦxyfyΦyzfzΦzxfx (127) transformation of φ into the primordial gravitational po- and the trispectrum tential ϕ via (ϕ) 2 2 T = h(ϕ − ϕ¯)x(ϕ − ϕ¯)y(ϕ − ϕ¯)z(ϕ − ϕ¯)ui(φ) (128) ϕ(x)= φ(x)+ fnl (φ (x) − hφ (x)i(φ)) (122) xyzu c = ΦxyΦzu +ΦxzΦyu +ΦxuΦyz + hϕx ϕy ϕz ϕui(φ) for any x. The parameter fnl controls the level and na- ture of non-Gaussianity via its absolute value and sign, 1 = Φ Φ +2Φ f Φ f Φ respectively. This means that our data model reads 8 xy zu xy y yz z zu  d = R (φ + f (φ2 − Φ)) + n, (123) + ΦxyfyΦyzfzΦzufuΦuxfx + 23perm. where we dropped the subscript of fnl. In the following  we assume the noise n to be Gaussianb with covariance † † −1 N = hnn i(n) and define as usual M = R N R for 14 notational convenience. 15 Since the bispectrum contains most of the non-Gaussianity sig- nature, we also provide its Fourier-space version, which is well- known for the fnl-model [e.g. 217]. The bispectrum for f = const, expressed in terms of the ϕ-covariance reads

14 Non-Gaussian noise components are in fact expected, and would (ϕ) (ϕ) (ϕ) (ϕ) (ϕ) (ϕ) (ϕ) 3 Bxyz = 2 f [Cxy Cyz + Cxz Czy + Cyx Cxz ]+ O(f ). need to be included into the construction of an optimal fnl- reconstruction. However, currently we aim only at outlining the Fourier transforming this yields principles and we are furthermore not aware of an traditional f - nl (ϕ) 3 B = 2 f (2 π) δ(k1 + k2 + k3) estimator constructed while taking such noise into account. And k1k2k3 finally, we show at the end how to identify some of such non- 3 × [P (k1)P (k2)+ P (k2)P (k3)+ P (k3)P (k1)] + O(f ), Gaussian noise sources by producing fnl-maps on the sphere, which can morphologically be compared to known foreground where P (k) is the power spectrum of ϕ, which is identical to that structures, like our Galaxy. of φ up to O(f 2). 29 of the gravitational potential. Since we will investigate either the noise covariance or the response matrix is non- the possibility of a spatially varying non-Gaussianity pa- diagonal, yielding a non-local M and therefore non-local rameter at the end of this section, we keep track of interactions Λ(3) and Λ(4). the spatial coordinate of f, but for the time being read We should note, that Babich [220] derived the now tra- fx = f. ditional fnl-estimator from a very similar starting point, The spectrum, bispectrum and trispectrum of our the log-probability for ϕ. The difference of the resulting CMB-measurement can easily be calculated from the estimators is not due to the slightly different approaches gravitational spectrum and bispectrum, respectively: (Hf [d, ϕ] versus Hf [d, φ]), but because of the frequentist and Bayes statistics he and we use, respectively. C(d) = R C(ϕ)R† + N, (129) In case that the noise as well as the response is di- (d) (ϕ) agonal in position space, as it is often assumed for the B = Rnˆ x Rnˆ y Rnˆ z B , nˆ1nˆ2nˆ3 1 2 3 xyz instrument response of properly cleaned CMB maps, T (d) = R R R R T (ϕ) and is also approximately valid on large angular scales, nˆ1nˆ2nˆ3nˆ4 nˆ1x nˆ2y nˆ3z nˆ3u xyzu where the Sachs-Wolfe effect dominates, we have Nxy = 1 2 (ϕ) † σn(x) δ(x − y), R = −3 [215] for the total intensity fluc- + R C R + N Nnˆ3nˆ4 −2 8 tuations, and thus Mxy =9 σ (x) δ(x−y), if we restrict " nˆ1nˆ2 n the signal space to the last-scattering surface, which we + 23 permutations , identify with S2. This permits to simplify the Hamilto-  nian to wheren ˆ denotes the unit vector on the sphere, and we 1 4 1 H [d, φ] = φ†D−1φ + H − j†φ + λ† φn, with have made use of the assumption of the noise being Gaus- f 2 0 n! n n=0 sian and independent of the signal. In case the noise itself X has a bi- or trispectrum, or there is a signal dependent −1 −1 −2 ′ 2 D = Φ +9 σn , j = j − λ1 = 3(3Φ f − d)/σn, noise, e.g. due to an incorrect instrument calibration, 2 † 3 2 ′ then more terms have to be added to the expressions. λ0 = 3 (Φ/σn)d( f Φ − f d), λ2 = −2bf j , The usually quoted formulae [e.g. 204, 217, 218, 219] can 2 λ = 54 f/σ2 , and λ = 108 f 2/σ2 . (131) be obtained from Eq. 129 by applying spherical harmonic 3 b n b4 n transformations. The numerical coefficients of the last two terms may look large, however, these coefficients stand in front of terms of typically φ3 ∼ 10−15, and φ4 ∼ 10−20, which ensures C. CMB-Hamiltonian their well-behavedness in any diagrammatic expansion series. Although we are not interested in the auxiliary field For later usage, we define the Wiener-filter reconstruc- φ, it is nevertheless very useful for its marginalization to tion of the gravitational potential as m0 = D j. define its Hamiltonian, which is

2 D. fnl-evidence and map making Hf [d, φ] = − log(G(φ, Φ) G(d − R (φ + f (φ − Φ)),N)) 1 4 1 = φ†D−1φ + H − j†φ + Λ(n)[bφ,...,φ], Since we are momentarily not interested in reconstruct- 2 0 n! n=0 ing the primordial fluctuations, but to extract knowledge X with on fnl, we marginalize the former by calculating the log- evidence log P (d|f) up to quadratic order in f: D−1 = Φ−1 + R†N −1R ≡ Φ−1 + M, j = R†N −1d, log Zf [d] = log Dφ P (d, φ|f) 1 Λ(0) = j†(f Φ) + (f Φ)†M (f Φ), (130) Z 2 = log Dφe−Hf [d,φ] (1) † ′ (1)† Λ = −(f Φ)b M andbj = j − bΛ , Z (2) ′ Λ = −2 fj , = −H0 − Λ0 + + + + + (3) b Λxyz = (Mxy fy δyz + 5 permutations), + + + + + + 1 c Λ(4) = (f δ M δ f + 23 permutations), xyzu 2 x xy yz zu u + + + + + and H0 collects all terms independent of φ and f. The + + + + O(f 3). (132) last two tensors should be read without the Einstein sum- convention, but with all possible index-permutations. We have made use of the fact that the logarithm of the Note, that this is a non-local theory for φ in case that partition sum is provided by all connected diagrams, and 30 that j′ contains a term of the order O(f 0), Λ(2) and Λ(3) E. Comparison to traditional estimator contain terms of the order O(f 1), and Λ(4) one of the 2 order O(f ), so that they can appear an unrestricted We conclude this chapter with a short comparison to number of times, twice and once in diagrams of order traditional fnl-estimators. To our knowledge, the most 2 th up to O(f ), respectively. Since only 4 order interac- developed estimator in the literature is based on the tions are involved, an implementation in spherical har- CMB-bispectrum, which is the third order correlation th monics space may be feasible using the only 4 order functions of the data [e.g. 220, 221, and references in C-coefficients (Eq. B3), which can be calculated com- Sect. IC5]. The IFT based filter presented here con- puter algebraically. Finally, we recall tains terms which are up to fourth order in the data, and therefore can be expected to be of higher accuracy since 1 1 = log |2πD| = Tr(log(2πD)). (133) both methods are supposed to be optimal. Kogo and 2 2 Komatsu [219] note that the CMB trispectrum should 2 contain significant information on fnl, and may be su- Although f is not known, the expressions in Eq. 132 perior to non-Gaussianity detection compared to the bi- 2 proportional to f and f can be calculated separately, spectrum on small angular scales. However, since the permitting to write down the Hamiltonian of f if a suit- trispectrum is insensitive to the sign of fnl, its actual able prior P (f) is chosen, usage as a proxy is a it more subtle. In the IFT esti- 2 mator, any term proportional to fnl enters the inverse of Hd[f] ≡ − log(P (d|f) P (f)) the propagator D˜, and therefore the trispectrum seems 1 to unfold its f -estimation power mostly in combination = H˜ + f †D˜ −1f + ˜j†f + O(f 3), (134) nl 0 2 with the bispectrum, which drives ˜j. Under which conditions does the traditional estimator where we collected the linear and quadratic coefficients emerge from the IFT one? There are three conceptual −1 into ˜j and D˜ . It is obvious that the optimal f- differences between the estimators, in that the IFT filter estimator to lowest order is therefore can handle inhomogeneous non-Gaussianity, correct for CMB sky and exposure chance coupling, and is unbiased ˜ mf = hfi(s,f|d) = D ˜j, (135) with respect to the posterior. The traditional estimator is usually written as and its uncertainty variance is just 1 1 ε = dx A(x) B2(x)= m†Φ−1m2, (139) † ˜ N N 0 0 h(f − mf ) (f − mf ) i(s,f|d) = D. (136) Z where B = D j = m0 is the Wiener-filter reconstruction So far, we have assumed f to have a single universal of the gravitational potential, A = Φ−1B is the same, value. However, we can also permit f to to vary spatially, just additionally filtered by the inverse power spectrum, or on the sphere of the sky. In the latter case one would and N is a normalization constant [e.g. 202]. This is fixed expand f as by the condition that the estimator should be unbiased with respect to all signal and noise realizations, lmax l † −1 2 f(x)= flm Ylm(ˆx) (137) N = hm0Φ m0i(d,s|f=1) l=0 m=−l (ϕ) −1 X X = Bxyz|f=1 (MD)xu Φuv (DM)vy (DM)vz = 2 [Φ Φ +Φ Φ +Φ Φ ] up to some finite lmax. Then one would recalculate the xy yz yz zx zx xy  −1 partition sum, now separately for terms proportional to × (MD)xu Φuv (DM)vy (DM)vz (140) flm and flm fl′m′ , which are then sorted into the vector and matrix coefficients of ˜j and D˜ −1, respectively and The first difference between the estimators is obvious, according to in that the IFT estimator can handle a spatially vary- ing f(x). Therefore, we will only regard spatially con- stant non-linearity parameters in the following. Since ˜ dHd[f] j(lm) = , and (138) no CMB experiment is able to measure the monopole dflm f=0 temperature fluctuation, the response to any spatially 2 ˜ −1 d Hd[f ] homogeneous signal is zero. This means, in Fourier ba- D(lm)(l′m′) = . df df ′ ′ sis, that R = 0 and therefore j = M ′ = 0. lm l m f=0 n,kˆ =0 k=0 k=0,k Thus, we find for a Universe with homogeneous statis- (0) (1) ′ f-map making can then proceed as described above tics (Φk6=0 = 0) that Λ = Λ = 0, j = j, and in spherical harmonics space. Comparing the resulting Λ(2) = −2f j, which reduces the number of diagrams map in angular space to known foreground sources, as we haveb to calculate. our Galaxy, the level of non-Gaussian contamination due The IFTb estimator is driven by the f-information to their imperfect removal from the data may be assessed. source ˜j, which is given by all diagrams which contain 31 terms linear in f. There are four of them, yielding The traditional estimator is unbiased in the frequentist sense, for an average over all signal f and data realiza- tions. However, the IFT estimator is unbiased in the 1 ˜j = + + + Bayesian sense, with respect to the posterior, the proba- f   bility distribution of all signals given the data. Since the data are given, and not assumed to vary any more after † −1 2 † −1  = m0Φ m0 + m0 Φ D − 2 MD , (141) the observation is performed, it can and should affect the h i normalization constant, which encodes the sensitivity of where we used M = D−1 − Φ−1 inb order tod combine the our non-Gaussianity estimation. two tree and the two loop diagrams into the first and The reason for the IFT normalization constant (or f- second term, respectively. The term resulting from the propagator) to be data dependent can be understood tree diagrams is actually identically to the unnormalised as follows. There are data realizations which are bet- traditional estimator ε (Eq. 139). ter suited to reveal the presence of a non-Gaussianities The terms resulting from the loop diagrams vanish for than others, even if they have identical ˜j. Such a de- an homogeneous M, which a CMB experiment with uni- pendence of the detectability of a effect on the concrete form exposure and constant noise could produce. In case data realization is common in non-linear Baysian infer- of an inhomogeneous M, which is the more realistic case, ence, and was even more prominent in the example of the the loop term does not vanish and corrects for chance cor- reconstruction of a log-normal density field in Sect. V. relations between the CMB-realization (as seen through j) and the noise and response structure of the experi- ment (as encoded in M and D). Creminelli et al. [222] VII. SUMMARY AND OUTLOOK already pointed out that such a linear correction term is necessary in case of an inhomogeneous sky coverage. Starting with fundamental information theoretical con- Anyhow, the second difference between the estimators siderations about the nature of measurements, signals, is that the IFT based one applies a correction for chance noise and their relation to a physical reality given a model correlations of CMB sky and sky exposure and the tradi- of the Universe or the system under consideration, we tional one does not. This term is absent in the traditional reformulated the inference problem in the language of estimator since the latter was constructed as the optimal information field theory (IFT). IFT is actually a statisti- estimator which is third order in the data. This excluded cal field theory. The information field is identified with a the loop term, which is linear in the data. spatially distributed signal, which can freely be chosen by An inclusion of this term into the traditional estimator the scientist according to needs and technical constraints. is straightforward and actually done by the more recent The mathematical apparatus of field theory permits to fnl measurements [e.g. 223]. The normalization constant deal with the ensemble of all possible field configurations N is unaffected by this, since the expectation value of the given the data and prior information in a consistent way. loop term averaged over all possible signal realization is With this conceptual framework, we derived the zero. Hamiltonian of the theory, showed that the free theory This brings us to the third difference between the esti- reproduces the well known results of Wiener-filter theory, mators, the different normalization. The traditional esti- and presented the Feynman-rules for non-linear, interact- mator is normalized by a data independent constant N , ing Hamiltonians in general, and in particular cases. The where the IFT estimator is normalized by a data depen- latter are information fields over Fourier- and spherical dent term harmonics-spaces for inference problems in Rn and S2, respectively. Our “philosophical” considerations permit- 1 2 ted to argue why the resulting IFTs are usually well nor- D˜ −1 = + + + + + 2 2 malized, but often non-local. Since the propagator of the σf f " theory is closely related to the Wiener-filter, for which nowadays efficient numerical algorithms exist as image + + + + reconstruction and map-making codes, and the informa- tion source term is usually a noise weighted version of the + + + + , (142) data, the necessary computational tools are at hand to  convert the diagrammatic expressions into well perform-  ing algorithms. where only the first three diagrams are data independent Furthermore, we provided the Boltzmann-Shannon in- and σf is the variance of the prior, which we assume to formation measure of IFT based on the Helmholtz free 2 be P (f)= G(f, σf ). The detailed expressions for the dif- energy, thereby highlighting the embedding of IFT in the ferent diagrams can be found in Appendix C. For both framework of statistical mechanics. estimators, the traditional and the IFT one, the normal- As examples of the IFT recipe, two concrete IFT prob- ization is supposed to guarantee unbiasedness, however, lems with cosmological motivation were discussed, which with respect to different probability distributions. are also thought as blueprints for other inference prob- 32 lems. The first was targeting at the problem of recon- tion between IFT and QFT. We gratefully acknowledge structing the spatially continuous cosmic LSS matter helpful comments on the manuscript by Marcus Br¨uggen distribution from discrete galaxy counts in incomplete and Thomas Riller and by three very constructive refer- galaxy surveys. The resulting algorithm can also be used ees. for image reconstruction with low-number photon statis- tics, e.g in low-dose X-ray imaging. The second example was the design of an optimal APPENDIX A: NOTATION method to measure or constrain any possible local non- linearities in the CMB temperature fluctuations. This We briefly summarize our notation of functions in po- may serve as a blueprint for statistical monitoring of the sition and Fourier space. linearity of a signal amplifier. A here usually real, but in principle also complex func- We conclude here with a short outlook on some prob- tion f(x) over the n-dimensional space is regarded as a lems that are accessible to the presented theory. vector f in a discrete and finite-dimensional, or contin- Many signal inference problems involve the reconstruc- uous and infinite-dimensional Hilbert space. f will de- tion of fields without precisely known statistics. Some note this vector, independently of the momentarily cho- coefficients in the IFT-Hamiltonians may only be phe- sen function basis, be it the real space f(x) = hx|fi or nomenological in nature, and therefore have to be de- the Fourier basis rived from the same data used for the reconstruction itself. This more intricate interplay of parameter and i k·x information field can also be incorporated into the IFT f(k)= hk|fi = dx f(x) e . (A1) framework, as we will show with a subsequent work. Z For cosmological applications, along the lines started Here, the volume integration usually is performed only in this work, clearly more realistic data models need to be over an finite domain with volume V . This leads to the investigated. For example, to understand the response in convention for the origin of the delta function in k-space, galaxy formation to the underlying dark matter distribu- tion in terms of a realistic, statistical model, to be used V δ(0) = , (A2) in constructing the corresponding IFT Hamiltonian for a (2 π)n dark-matter information field, detailed higher-order cor- relation coefficients have to be distilled from numerical and also to a Fourier transformation operator F = |kihx|, simulations or semi-analytic descriptions. Also the CMB ikx † † with Fkx = e , and its inverse F = |xihk|, with Fxk = Hamiltonian may benefit from the inclusion of remnants e−ikx. The dagger is used to denote transposed and from the CMB foreground subtraction process, permit- † complex conjugated objects. We have (F F )xy =1xy as ting to gather more solid evidence on fundamental pa- † well as (F F )kk′ = 1kk′ for the following definition of rameters which are hidden in the CMB fluctuations, like the scalar product of two functions f and g in real and the amplitude of non-Gaussianities. Fourier space: Furthermore, there exist a number of more or less heuristic algorithms for inverse problems, which have dk f †g = hf|gi = dx f ∗(x) g(x)= f ∗(k) g(k), proven to serve well under certain circumstances. Re- (2 π)n verse engineering of their implicitly assumed priors and Z Z (A3) data models may permit to understand better for which where the asterix denotes complex conjugation. The conditions they are best suited, as well how to improve statistical power-spectrum of f is denoted by Pf (k) = them in case these conditions are not exactly met. 2 h|f(k)| i(f)/V . Finally, we are very curious to see whether and how We also introduce for convenience the position-space the presented framework may be suitable to inference component-wise product of two functions problems in other scientific fields. (fg)(x) ≡ f(x) g(x), (A4)

Acknowledgements which also permits compact notations like

It is a pleasure to thank the following people for help- (log f)(x) = log(f(x)), (f/g)(x)= f(x)/g(x), (A5) ful scientific discussions on various aspects of this work: Simon White on the dangers of perturbation theory, Ben- and alike. The component-wise product should not jamin Wandelt on the prospects of large-scale structure be confused with the tensor product of two vectors reconstruction, Jens Jasche on the pleasures and pains (fg†)(x, y)= f(x) g∗(y). of signal processing, J¨org Rachen on the philosophy of The diagonal components of a matrix M in position- science, and Andr´eWaelkens on the invariant, but ver- space representation form a vector which we denote by tiginous theory of isotropic tensors. We thank Cornelius Weig and Henrik Junklewitz for debates on the connec- M = diagxM, with Mx = Mxx. (A6)

c c 33

Similarly, a diagonal matrix in position-space represen- Therefore, we can just insert real-space identity matrices tation, whose diagonal components are given by a vector 1 = Y Y † in between any expression in real-space dia- f, will be denoted by grammatic expression and assign Y † to the right, and Y to the left term of it. This way we find the spherical- f = diagxf with fxy = fx 1xy. (A7) harmonics Feynman rules, which are very similar to the Fourier-space ones, in that they also require directed Thus, M = Mb if and only ifbM diagonal, and f = f propagators-lines for proper angular-momentum conser- always. vation. For a theory with only local interactions, these In ourc notation a multivariate Gaussian reads: bb read:

1 1 † −1 1. An open end of a line has external (not summed) G(s,S)= 1 exp − s S s (A8) 2 2 angular-momentum quantum numbers (l,m). |2πS|   † Here, S = hss i(s) denotes the covariance tensor of the 2. A line connecting momentum (l,m) with momen- Gaussian field s, which is drawn from P (s)= G(s,S). If tum (l′,m′) corresponds to a propagator between s is statistically homogeneous, S is fully described by the these momenta: D(l,m)(l′,m′) = CD(l) δll′ δmm′ , power-spectrum Ps(k): where CD(l) is the angular power spectrum of the propagator. n ′ Sk k′ = (2 π) δ(k − k ) Ps(k), −1 n ′ −1 3. A data source vertex is (j + J − λ1)(l,m), where S ′ = (2 π) δ(k − k ) (P (k)) . (A9) k k s (l,m) is the angular momentum at the data-end of The Fourier representation of the trace of a Fourier- the line. diagonal operator, 4. A vertex with quantum number (l0,m0) with nin dk incoming and nout outgoing lines (nin + nout > 1) Tr(A)= dx Axx = V n PA(k), (A10) (2 π) with momentum labels (l1,m1) ... (lnin ,mnin ) and Z Z ′ ′ ′ ′ (l1,m1) ... (lnout ,mnout ), respectively, is given by (l′ ,m′ )...(l′ ,m′ ) is very useful in combination with the following expres- 1 1 nout nout −λm(l0,m0) C , where C will be sion for the determinant of a Hermitian matrix, (l0,m0)...(lnin ,mnin ) defined in Eq. B3. log |A| = Tr(log A). (A11) 5. An internal vertex has internal (summed) angular- Furthermore, we usually suppress the dependency of momentum quantum numbers (l′,m′). Summation probabilities on the underlying model I and its param- ∞ l′ means a term l′=0 m=−l′ in front of the expres- eters θ in our notation. I.e. instead of P (s|θ, I) we sion. just write P (s) or P (s|θ) depending on our focus. Here P P θ = (S, N, R, ...) contains all the parameters of the model, 6. The expression gets divided by the symmetry factor which are assumed to be known within this work. of its diagram.

The interaction structure in spherical harmonics-space is APPENDIX B: FEYNMAN RULES ON THE complicated due to the non-orthogonality of powers and SPHERE products of the spherical harmonic functions, compared to the Fourier-space case, where any power or product Here, we provide the Feynman rules on the sphere. of Fourier-basis functions is again a single Fourier-basis The real-space rules are identical to those of flat spaces, function. with just the scalar product replaced by the integral over The spherical structure is encapsulated in the coeffi- the sphere, etc. In case the problem at hand has an cients isotropic propagator, which only depends on the distance nin nout (l′ ,m′ )...(l′ ,m′ ) of two points on the sphere, but not on their location 1 1 nout nout ∗ C(l ,m )...(l ,m ) ≡ dx Ylimi (x) Yl′ m′ (x) , or orientation, the propagator is diagonal if expressed in 0 0 nin nin i i Z i=0 ! i=1 ! spherical harmonics Ylm(x). Thanks to the orthogonality Y Y (B3) 2 relation of spherical harmonics, we have for x, y ∈ S which can be expressed in terms of sums and products ∗ † ∗ of Wigner coefficients, thanks to the relations Ylm(x) = (Y Y )xy = Ylm(x) Y (y)= δ(x − y) = (1)xy (B1) lm Yl,−m(x), Xlm and (2 l1 + 1)(2 l2 + 1)(2 l + 1) Y (x) Y (x) = l1m1 l2 m2 4 π † ∗ lm r (Y Y )(l,m)(l′,m′) = dx Ylm(x) Yl′m′ (x) X l1 l2 l l1 l2 l Z × Ylm(x) , (B4) = δll′ δmm′ = (1) ′ ′ . (B2) m1 m2 m 0 00 (l,m)(l ,m )     34 and the orthogonality relation in Eq. B2, to be applied +2 Tr [mD (mM + M m) DM] (C6) successively in this order. Due to this complication, it is † probably most efficient to calculate propagation in spher- = −2 2DMb + bDM D jmb (C7) ical harmonics space, but to change back to real space for h i † any interaction vertex of high order. = −m2 MdD − b2 Tr [mMb mD] (C8)

† = 2DM +bDM D b2mMmb + Mm2 (C9) APPENDIX C: fnl-PROPAGATOR h i   1 d b † We provide in the following the individual terms of the = 2mMm + Mm2 bD 2mMm + Mm2 2 fnl-Propagator in Eq. 142. The individual diagrams are 2    (C10) all O(f ) and are given here for the case f = 1: b b 1 † 2 = −Tr D2 M − D†M D (C1) = −2 (m j) D (Mm +2 mMm) (C11) 2 1   † 1 2† 2 = 2DM + DM bD 2MDb + MD (C2) = − m Mm b (C12) 2 2 h i h i = 2 (m j)†D (jm) (C13) = Tr Dd2 MDMb d b +2MxyDyy′ My′x′ Dx′yDxx′ (C3) = j†D2j (C4) 2 We used here the conventions m = D j and (D )xy = † 2 2 (0) (1) ′ = −2 m MD j − 4 Tr mD jDM (C5) (Dxy) and remind that Λ = Λ = 0, j = j, (2) (3) (4) h i Λ = −2 f j,Λxyz = [Mxyδyz + 5 perm.],Λxyzu = † 2 b 1 = m MD Mm + 4Tr bmD MmDM 2 [δxyMyzδzu + 23 perm.]. h i b b d

[1] T. Bayes, Phil. Trans. Roy. Soc. 53, 370 (1763). of Computer Science, University of Toronto, 1993). [2] C. E. Shannon, Bell System Technical Journal 27, 379 [18] C. P. Robert, The Bayesian choice (Springer-Verlag, (1948). New York, 2001). [3] C. E. Shannon and W. Weaver, The mathematical the- [19] A. Gelman, J. B. Carlin, H. S. Stern, and D. Rubin, ory of communication (Urbana: University of Illinois Bayesian data analysis (Chapman & Hall/CRC, Boca Press, 1949, 1949). Raton, Florida, 2004). [4] E. T. Jaynes, Physical Review 106, 620 (1957). [20] R. A. Aster, B. Brochers, and C. H. Thurber, Parame- [5] E. T. Jaynes, Physical Review 108, 171 (1957). ter estimation and inverse problems (Elsevier Academic [6] E. T. Jaynes, in Statistical Physics 3 (1963), p. 181. Press, London, 2005). [7] E. T. Jaynes, American Journal of Physics 33, 391 [21] R. Trotta, ArXiv e-prints 0803.4089 (2008), 0803.4089. (1965). [22] N. Wiener, Extrapolation, Interpolation, and Smoothing [8] E. T. Jaynes, IEEE Trans. on Systems Science and Cy- of Stationary Time Series (New York: Wiley, 1949). bernetics SSC-4, 227 (1968). [23] W. H. Richardson, Journal of the Optical Society of [9] E. T. Jaynes, in Proc. IEEE, Volume 70, p. 939-952 America (1917-1983) 62, 55 (1972). (1982), pp. 939–952. [24] L. B. Lucy, AJ 79, 745 (1974). [10] E. T. Jaynes and R. Baierlein, Physics Today 57, 76 [25] B. R. Frieden, Journal of the Optical Society of America (2004). (1917-1983) 62, 511 (1972). [11] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, [26] S. F. Gull and G. J. Daniell, Nature (London) 272, 686 A. H. Teller, and E. T. Teller, Journal of Chemical (1978). Physics 21, 1087 (1953). [27] J. Skilling, A. W. Strong, and K. Bennett, MNRAS 187, [12] W. K. Hastings, Biometrika 57, 97 (1970). 145 (1979). [13] S. Geman and D. Geman, IEEE Transactions on Pat- [28] R. K. Bryan and J. Skilling, MNRAS 191, 69 (1980). tern Analysis and Machine Intelligence 6, 721 (1984). [29] S. F. Burch, S. F. Gull, and J. Skilling, Computer Vision [14] S. Duan, A. Kennedy, B. Pendleton, and D. Roweth, Graphics and Image Processing 23, 113 (1983). Phys. Lett. B 195, 216 (1987). [30] S. F. Gull and J. Skilling, in Indirect Imaging. Mea- [15] K. P. N. Murthy, M. Janani, and B. Shenbga surement and Processing for Indirect Imaging. Pro- Priya, ArXiv Computer Science e-prints (2005), ceedings of an International Symposium held in Syd- arXiv:cs/0504037. ney, Australia, August 30-September 2, 1983. Editor, [16] M. A. Tanner, Tools for (Springer- J.A. Roberts; Publisher, Cambridge University Press, Verlag, New York, 1996). Cambridge, England, New York, NY, 1984. LC # [17] R. M. Neal, in Technical Report CRG-TR-93-1 (Dept. QB51.3.E43 I53 1984. ISBN # 0-521-26282-8. P.267, 35

1983 (1983), p. 267. [62] J. M. Bardeen, J. R. Bond, N. Kaiser, and A. S. Szalay, [31] S. Sibisi, J. Skilling, R. G. Brereton, E. D. Laue, and Astrophys. J. 304, 15 (1986). J. Staunton, Nature (London) 311, 446 (1984). [63] P. J. E. Peebles, The large-scale structure of the universe [32] D. M. Titterington and J. Skilling, Nature (London) (Research supported by the National Science Foun- 312, 381 (1984). dation. Princeton, N.J., Princeton University Press, [33] J. Skilling and R. K. Bryan, MNRAS 211, 111 (1984). 1980. 435 p., 1980). [34] R. K. Bryan and J. Skilling, Journal of Modern Optics [64] N. Kaiser, MNRAS 227, 1 (1987). 33, 287 (1986). [65] P. J. E. Peebles, Astrophys. J. 362, 1 (1990). [35] S. F. Gull, in Maximum Entropy and Bayesian Meth- [66] F. Bernardeau, ApJL 390, L61 (1992). ods, edited by J. Skilling (Kluwer Academic Publishers, [67] S. Zaroubi and Y. Hoffman, Astrophys. J. 462, 25 Dordtrecht, 1989), pp. 53–71. (1996). [36] S. F. Gull and J. Skilling, The MEMSYS5 User’s Man- [68] A. J. S. Hamilton, in The Evolving Universe, edited by ual (Maximum Entropy Data Consultants Ltd, Roys- D. Hamilton (Kluwer Academic Publishers, Dordtrecht, ton, 1990). 1998), vol. 231 of Astrophysics and Space Science Li- [37] J. Skilling, in Maximum Entropy and Bayesian Methods, brary, p. 185. edited by G. J. Erickson, J. T. Rychert, and C. R. Smith [69] F. Bernardeau, M. J. Chodorowski, E. L.Lokas, (1998), p. 1. R. Stompor, and A. Kudlicki, MNRAS 309, 543 (1999), [38] F. S. Kitaura and T. A. Enßlin, MNRAS 389, 497 astro-ph/9901057. (2008), 0705.0429. [70] E. Branchini, L. Teodoro, C. S. Frenk, I. Schmoldt, [39] R. Narayan and R. Nityananda, ARAA 24, 127 (1986). G. Efstathiou, S. D. M. White, W. Saunders, W. Suther- [40] R. Molina, J. Nunez, F. J. Cortijo, and J. Mateos, Signal land, M. Rowan-Robinson, O. Keeble, et al., MNRAS Processing Magazine, IEEE 18, 11 (2001). 308, 1 (1999), astro-ph/9901366. [41] E. Bertschinger, ApJL 323, L103 (1987). [71] A. Dekel and O. Lahav, Astrophys. J. 520, 24 (1999), [42] J. N. Fry, Astrophys. J. 289, 10 (1985). astro-ph/9806193. [43] W. Bialek and A. Zee, Physical Review Letters 58, 741 [72] S. Zaroubi, ArXiv Astrophysics e-prints (2002), astro- (1987). ph/0206052. [44] W. Bialek and A. Zee, Physical Review Letters 61, 1512 [73] R. E. Smith, J. A. Peacock, A. Jenkins, S. D. M. White, (1988). C. S. Frenk, F. R. Pearce, P. A. Thomas, G. Efstathiou, [45] W. Bialek, C. G. Callan, and S. P. Strong, Physical Re- and H. M. P. Couchman, MNRAS 341, 1311 (2003), view Letters 77, 4693 (1996), arXiv:cond-mat/9607180. arXiv:astro-ph/0207664. [46] P. Stoica, E. G. Larsson, and J. Li, AJ 120, 2163 (2000). [74] R. Scoccimarro, Phys. Rev. D 70, 083007 (2004), astro- [47] T. Enßlin and M. Frommert, in preparation (2009). ph/0407214. [48] J. C. Lemm, ArXiv Physics e-prints (1999), [75] V. Springel, S. D. M. White, A. Jenkins, C. S. Frenk, physics/9912005. N. Yoshida, L. Gao, J. Navarro, R. Thacker, D. Cro- [49] J. C. Lemm and J. Uhlig, Few-Body Systems 29, 25 ton, J. Helly, et al., Nature (London) 435, 629 (2005), (2000), arXiv:quant-ph/0006027. arXiv:astro-ph/0504097. [50] J. C. Lemm, J. Uhlig, and A. Weiguny, Physical Review [76] P. Valageas, A&A 421, 23 (2004), arXiv:astro- Letters 84, 2068 (2000), arXiv:cond-mat/9907013. ph/0307008. [51] J. C. Lemm and J. Uhlig, Physical Review Letters 84, [77] P. Valageas, A&A 476, 31 (2007), arXiv:0706.2593. 4517 (2000), arXiv:nucl-th/9908056. [78] P. Valageas, A&A 484, 79 (2008), arXiv:0711.3407. [52] J. C. Lemm, Physics Letters A 276, 19 (2000). [79] M. Crocce and R. Scoccimarro, Physical Review D 73, [53] J. C. Lemm, in Bayesian Inference and Maximum En- 063519 (2006), arXiv:astro-ph/0509418. tropy Methods in Science and Engineering, edited by [80] M. Crocce and R. Scoccimarro, Physical Review D 73, A. Mohammad-Djafari (2001), vol. 568 of American In- 063520 (2006), arXiv:astro-ph/0509419. stitute of Physics Conference Series, pp. 425–436. [81] P. McDonald, Phys. Rev. D 74, 103512 (2006), [54] J. C. Lemm, J. Uhlig, and A. Weiguny, European Phys- arXiv:astro-ph/0609413. ical Journal B 20, 349 (2001), arXiv:quant-ph/0005122. [82] P. McDonald, Phys. Rev. D 74, 129901(E) (2006). [55] J. C. Lemm, J. Uhlig, and A. Weiguny, European Phys- [83] P. McDonald, Phys. Rev. D 75, 043514 (2007), ical Journal B 46, 41 (2005). arXiv:astro-ph/0606028. [56] J. C. Lemm, ArXiv Condensed Matter e-prints (1998), [84] D. Jeong and E. Komatsu, Astrophys. J. 651, 619 cond-mat/9808039. (2006), arXiv:astro-ph/0604075. [57] J. Binney, N. Dowrick, A. Fisher, and M. Newman, The [85] D. Jeong and E. Komatsu, ArXiv e-prints 0805.2632 theory of critical phenomena (Oxford University Press, (2008), 0805.2632. Oxford, UK: ISBN0-19-851394-1, 1992). [86] S. Matarrese and M. Pietroni, Journal of Cosmology [58] M. E. Peskin and D. V. Schroeder, An Introduction to and Astro-Particle Physics 6, 26 (2007), arXiv:astro- (Westview Press Boulder, Col- ph/0703563. orado: 1995, ISBN-13 978-0-201-50397-5., 1995). [87] J. Gaite and A. Dom´ınguez, Journal of Physics A [59] A. Zee, Quantum field theory in a nutshell (Quantum Mathematical General 40, 6849 (2007), arXiv:astro- field theory in a nutshell, by A. Zee. Princeton, NJ: ph/0610886. Princeton University Press, 2003, ISBN 0691010196., [88] S. Matarrese and M. Pietroni, Modern Physics Letters 2003). A 23, 25 (2008), arXiv:astro-ph/0702653. [60] S. Matarrese, F. Lucchin, and S. A. Bonometto, ApJL [89] T. Matsubara, Phys. Rev. D 77, 063530 (2008), 310, L21 (1986). arXiv:0711.2521. [61] Y. B. Zel’dovich, A&A 5, 84 (1970). [90] M. Pietroni, ArXiv e-prints 0806.0971 (2008), 36

0806.0971. [122] S. Basilakos and M. Plionis, Astrophys. J. 550, 522 [91] E. Bertschinger and A. Dekel, ApJL 336, L5 (1989). (2001), astro-ph/0011265. [92] E. Bertschinger and A. Dekel, in ASP Conf. Ser. 15: [123] D. M. Goldberg, Astrophys. J. 552, 413 (2001), astro- Large-Scale Structures and Peculiar Motions in the Uni- ph/0008266. verse, edited by D. W. Latham and L. A. N. da Costa [124] U. Frisch, S. Matarrese, R. Mohayaee, and (1991), p. 67. A. Sobolevski, Nature (London) 417, 260 (2002), [93] P. J. E. Peebles, ApJL 344, L53 (1989). arXiv:astro-ph/0109483. [94] A. Dekel, E. Bertschinger, and S. M. Faber, Astrophys. [125] S. Zaroubi, MNRAS 331, 901 (2002), astro-ph/0010561. J. 364, 349 (1990). [126] Y. Brenier, U. Frisch, M. H´enon, G. Loeper, S. Matar- [95] N. Kaiser and A. Stebbins, in ASP Conf. Ser. 15: Large- rese, R. Mohayaee, and A. Sobolevski˘i, MNRAS 346, Scale Structures and Peculiar Motions in the Universe, 501 (2003), astro-ph/0304214. edited by D. W. Latham and L. A. N. da Costa (1991), [127] R. Mohayaee, U. Frisch, S. Matarrese, and p. 111. A. Sobolevskii, A&A 406, 393 (2003), arXiv:astro- [96] Y. Hoffman and E. Ribak, ApJL 380, L5 (1991). ph/0301641. [97] D. H. Weinberg, MNRAS 254, 315 (1992). [128] R. Mohayaee, B. Tully, and U. Frisch, ArXiv Astro- [98] A. Nusser and A. Dekel, Astrophys. J. 391, 443 (1992). physics e-prints (2004), astro-ph/0410063. [99] G. B. Rybicki and W. H. Press, Astrophys. J. 398, 169 [129] C. S. Botzler, J. Snigula, R. Bender, and U. Hopp, MN- (1992). RAS 349, 425 (2004), arXiv:astro-ph/0312018. [100] M. Gramann, Astrophys. J. 405, 449 (1993). [130] R. Mohayaee and R. B. Tully, ApJL 635, L113 (2005), [101] G. Ganon and Y. Hoffman, ApJL 415, L5 (1993). astro-ph/0509313. [102] F. Bernardeau, A&A 291, 697 (1994), astro- [131] R. Mohayaee, H. Mathis, S. Colombi, and J. Silk, MN- ph/9403020. RAS 365, 939 (2006), astro-ph/0501217. [103] A. Nusser and M. Davis, ApJL 421, L1 (1994), astro- [132] V. Icke and R. van de Weygaert, qras 32, 85 (1991). ph/9309009. [133] S. Ikeuchi and E. L. Turner, MNRAS 250, 519 (1991). [104] O. Lahav, in ASP Conf. Ser. 67: Unveiling Large- [134] F. Bernardeau and R. van de Weygaert, MNRAS 279, Scale Structures Behind the Milky Way, edited by 693 (1996). C. Balkowski and R. C. Kraan-Korteweg (1994), p. 171. [135] W. E. Schaap and R. van de Weygaert, A&A 363, L29 [105] O. Lahav, K. B. Fisher, Y. Hoffman, C. A. Scharf, and (2000), astro-ph/0011007. S. Zaroubi, ApJL 423, L93 (1994), astro-ph/9311059. [136] R. van de Weygaert and W. Schaap, in Mining the Sky, [106] K. B. Fisher, O. Lahav, Y. Hoffman, D. Lynden- edited by A. J. Banday, S. Zaroubi, and M. Bartelmann Bell, and S. Zaroubi, MNRAS 272, 885 (1995), astro- (2001), p. 268. ph/9406009. [137] M. Ramella, W. Boschin, D. Fadda, and M. Nonino, [107] R. K. Sheth, MNRAS 277, 933 (1995), astro- A&A 368, 776 (2001), arXiv:astro-ph/0101411. ph/9511096. [138] L. Zaninetti, Chinese Journal of Astronomy and Astro- [108] S. Zaroubi, Y. Hoffman, K. B. Fisher, and O. Lahav, physics 6, 387 (2006), arXiv:astro-ph/0602431. Astrophys. J. 449, 446 (1995), astro-ph/9410080. [139] E. Bertschinger, A. Dekel, S. M. Faber, A. Dressler, and [109] M. Tegmark and B. C. Bromley, Astrophys. J. 453, D. Burstein, Astrophys. J. 364, 370 (1990). 533 (1995), astro-ph/9409038. [140] A. Yahil, M. A. Strauss, M. Davis, and J. P. Huchra, [110] R. A. C. Croft and E. Gaztanaga, MNRAS 285, 793 Astrophys. J. 372, 380 (1991). (1997), astro-ph/9602100. [141] K. B. Fisher, C. A. Scharf, and O. Lahav, MNRAS 266, [111] V. K. Narayanan and D. H. Weinberg, Astrophys. J. 219 (1994), astro-ph/9309027. 508, 440 (1998), astro-ph/9806238. [142] E. J. Shaya, P. J. E. Peebles, and R. B. Tully, Astro- [112] U.-L. Pen, Astrophys. J. 504, 601 (1998), astro- phys. J. 454, 15 (1995), astro-ph/9506144. ph/9711180. [143] E. Branchini, M. Plionis, and D. W. Sciama, ApJL 461, [113] U. Seljak, Astrophys. J. 503, 492 (1998), astro- L17 (1996), astro-ph/9512055. ph/9710269. [144] M. Webster, O. Lahav, and K. Fisher, MNRAS 287, [114] U. Seljak, Astrophys. J. 506, 64 (1998), astro- 425 (1997), astro-ph/9608021. ph/9711124. [145] C. Yess, S. F. Shandarin, and K. B. Fisher, Astrophys. [115] V. Bistolas and Y. Hoffman, Astrophys. J. 492, 439 J. 474, 553 (1997), astro-ph/9605041. (1998), astro-ph/9707243. [146] I. M. Schmoldt, V. Saar, P. Saha, E. Branchini, G. P. Ef- [116] A. Taylor and H. Valentine, MNRAS 306, 491 (1999), stathiou, C. S. Frenk, O. Keeble, S. Maddox, R. McMa- astro-ph/9901171. hon, S. Oliver, et al., Astrophys. J. 118, 1146 (1999), [117] V. K. Narayanan and R. A. C. Croft, Astrophys. J. 515, astro-ph/9906035. 471 (1999), astro-ph/9806255. [147] A. Nusser and M. Haehnelt, MNRAS 303, 179 (1999), [118] S. Zaroubi, Y. Hoffman, and A. Dekel, Astrophys. J. astro-ph/9806109. 520, 413 (1999), astro-ph/9810279. [148] M. Tegmark and B. C. Bromley, The As- [119] D. M. Goldberg and D. N. Spergel, in ASP Conf. Ser. trophysical Journal 518, L69 (1999), URL 201: Cosmic Flows Workshop, edited by S. Courteau http://www.citebase.org/abstract?id=oai:arXiv.org:astro-ph/9809324. and J. Willick (2000), p. 282. [149] Y. Hoffman and S. Zaroubi, ApJL 535, L5 (2000), astro- [120] D. M. Goldberg and D. N. Spergel, Astrophys. J. 544, ph/0003306. 21 (2000), astro-ph/9912408. [150] D. M. Goldberg, Astrophys. J. 550, 87 (2001), astro- [121] A. Kudlicki, M. Chodorowski, T. Plewa, and ph/0009046. M. R´o˙zyczka, MNRAS 316, 464 (2000), astro- [151] H. Mathis, G. Lemson, V. Springel, G. Kauffmann, ph/9910018. 37

S. D. M. White, A. Eldar, and A. Dekel, MNRAS 333, [177] B. D. Wandelt, D. L. Larson, and A. Lakshmi- 739 (2002), astro-ph/0111099. narayanan, Phys. Rev. D 70, 083511 (2004), astro- [152] P. Erdo˘gdu, O. Lahav, S. Zaroubi, and et al., MNRAS ph/0310080. 352, 939 (2004), astro-ph/0312546. [178] H. K. Eriksen, I. J. O’Dwyer, J. B. Jewell, B. D. Wan- [153] M. S. Vogeley, F. Hoyle, R. R. Rojas, and D. M. Gold- delt, D. L. Larson, K. M. G´orski, S. Levin, A. J. Ban- berg, in IAU Colloq. 195: Outskirts of Galaxy Clus- day, and P. B. Lilje, ApJS 155, 227 (2004), astro- ters: Intense Life in the Suburbs, edited by A. Diaferio ph/0407028. (2004), pp. 5–11. [179] J. Jewell, S. Levin, and C. H. Anderson, Astrophys. J. [154] J. Huchra, T. Jarrett, M. Skrutskie, R. Cutri, S. Schnei- 609, 1 (2004), astro-ph/0209560. der, L. Macri, R. Steining, J. Mader, N. Martimbeau, [180] D. Yvon and F. Mayet, A&A 436, 729 (2005), astro- and T. George, in ASP Conf. Ser. 329: Nearby Large- ph/0401505. Scale Structures and the Zone of Avoidance, edited by [181] E. Keih¨anen, H. Kurki-Suonio, and T. Poutanen, MN- A. P. Fairall and P. A. Woudt (2005), p. 135. RAS 360, 390 (2005), astro-ph/0412517. [155] W. J. Percival, MNRAS 356, 1168 (2005), astro- [182] E. C. Sutton and B. D. Wandelt, ApJS 162, 401 (2006). ph/0410631. [183] D. L. Larson, H. K. Eriksen, B. D. Wandelt, K. M. [156] P. Erdo˘gdu, O. Lahav, J. Huchra, and et al., MNRAS G´orski, G. Huey, J. B. Jewell, and I. J. O’Dwyer, As- 373, 45 (2006), astro-ph/0610005. trophys. J. 656, 653 (2007), astro-ph/0608007. [157] Y. Hoffman, in ASP Conf. Ser. 67: Unveiling Large- [184] G. Hinshaw et al. (WMAP), arXiv 0803.0732 (2008), Scale Structures Behind the Milky Way, edited by 0803.0732. C. Balkowski and R. C. Kraan-Korteweg (1994), p. 185. [185] U. Seljak and M. Zaldarriaga, Astrophys. J. 469, 437 [158] S. Zaroubi, in ASP Conf. Ser. 218: Mapping the Hid- (1996), arXiv:astro-ph/9603033. den Universe: The Universe behind the Mily Way - The [186] A. Lewis, A. Challinor, and A. Lasenby, Astrophys. J. Universe in HI, edited by R. C. Kraan-Korteweg, P. A. 538, 473 (2000), astro-ph/9911177. Henning, and H. Andernach (2000), p. 173. [187] M. Doran, Journal of Cosmology and Astro-Particle [159] R. C. Kraan-Korteweg and O. Lahav, AAPR 10, 211 Physics 10, 11 (2005), arXiv:astro-ph/0302138. (2000), astro-ph/0005501. [188] E. F. Bunn and N. Sugiyama, Astrophys. J. 446, 49 [160] J. A. Peacock and S. J. Dodds, MNRAS 267, 1020 (1995), astro-ph/9407069. (1994), astro-ph/9311057. [189] M. Tegmark, A. N. Taylor, and A. F. Heavens, Astro- [161] M. S. Vogeley and A. S. Szalay, Astrophys. J. 465, 34 phys. J. 480, 22 (1997), astro-ph/9603021. (1996), astro-ph/9601185. [190] M. Tegmark, Phys. Rev. D 55, 5895 (1997), astro- [162] S. Zaroubi, I. Zehavi, A. Dekel, Y. Hoffman, and T. Ko- ph/9611174. latt, Astrophys. J. 486, 21 (1997), astro-ph/9610226. [191] M. R. Nolta et al. (WMAP), arXiv 0803.0593 (2008), [163] M. Tegmark, Physical Review Letters 79, 3806 (1997), 0803.0593. astro-ph/9706198. [192] A. H. Guth, Phys. Rev. D 23, 347 (1981). [164] D. J. Eisenstein and W. Hu, Astrophys. J. 511, 5 [193] A. D. Linde, Physics Letters B 108, 389 (1982). (1999), astro-ph/9710252. [194] A. Albrecht and P. J. Steinhardt, Physical Review Let- [165] G. Efstathiou, J. R. Bond, and S. D. M. White, MNRAS ters 48, 1220 (1982). 258, 1P (1992). [195] A. H. Guth and S.-Y. Pi, Physical Review Letters 49, [166] E. F. Bunn, D. Scott, and M. White, ApJL 441, L9 1110 (1982). (1995), astro-ph/9409003. [196] A. A. Starobinsky, Physics Letters B 117, 175 (1982). [167] M. A. Janssen and S. Gulkis, in NATO ASIC Proc. 359: [197] J. M. Bardeen, P. J. Steinhardt, and M. S. Turner, Phys. The Infrared and Submillimetre Sky after COBE, edited Rev. D 28, 679 (1983). by M. Signore and C. Dupraz (Kluwer Academic Pub- [198] W. Hu, Phys. Rev. D 64, 083005 (2001), astro- lishers, Dordtrecht, 1992), pp. 391–408. ph/0105117. [168] E. F. Bunn, K. B. Fisher, Y. Hoffman, O. Lahav, [199] F. Bernardeau and J.-P. Uzan, Phys. Rev. D 66, 103506 J. Silk, and S. Zaroubi, ApJL 432, L75 (1994), astro- (2002), hep-ph/0207295. ph/9404007. [200] N. Bartolo, E. Komatsu, S. Matarrese, and A. Riotto, [169] K. Maisinger, M. P. Hobson, and A. N. Lasenby, MN- Phys. Rep. 402, 103 (2004), astro-ph/0406398. RAS 290, 313 (1997). [201] D. Babich, P. Creminelli, and M. Zaldarriaga, Journal [170] M. Tegmark, Phys. Rev. D 56, 4514 (1997), astro- of Cosmology and Astro-Particle Physics 8, 9 (2004), ph/9705188. arXiv:astro-ph/0405356. [171] M. Tegmark, ApJL 480, L87 (1997), astro-ph/9611130. [202] E. Komatsu, B. D. Wandelt, D. N. Spergel, A. J. Ban- [172] S. Dodelson, Astrophys. J. 482, 577 (1997), astro- day, and K. M. G´orski, Astrophys. J. 566, 19 (2002), ph/9512021. arXiv:astro-ph/0107605. [173] M. P. Hobson, A. W. Jones, A. N. Lasenby, and F. R. [203] D. Babich and M. Zaldarriaga, Phys. Rev. D 70, 083005 Bouchet, MNRAS 300, 1 (1998), astro-ph/9806387. (2004), arXiv:astro-ph/0408455. [174] P. Natoli, G. de Gasperis, C. Gheller, and N. Vittorio, [204] E. Komatsu, D. N. Spergel, and B. D. Wandelt, Astro- A&A 372, 346 (2001), astro-ph/0101252. phys. J. 634, 14 (2005), arXiv:astro-ph/0305189. [175] O. Dor´e, R. Teyssier, F. R. Bouchet, D. Vibert, and [205] A. P. S. Yadav, E. Komatsu, and B. D. Wandelt, As- S. Prunet, A&A 374, 358 (2001), astro-ph/0101112. trophys. J. 664, 680 (2007), arXiv:astro-ph/0701921. [176] R. Stompor, A. Balbi, J. D. Borrill, P. G. Ferreira, [206] A. P. S. Yadav, E. Komatsu, B. D. Wandelt, M. Liguori, S. Hanany, A. H. Jaffe, A. T. Lee, S. Oh, B. Rabii, F. K. Hansen, and S. Matarrese, Astrophys. J. 678, 578 P. L. Richards, et al., Phys. Rev. D 65, 022003 (2001), (2008), arXiv:0711.4933. astro-ph/0106451. [207] E. Komatsu, A. Kogut, M. R. Nolta, C. L. Bennett, 38

M. Halpern, G. Hinshaw, N. Jarosik, M. Limon, S. S. (1967). Meyer, L. Page, et al., ApJS 148, 119 (2003), astro- [216] M. J. Rees and D. W. Sciama, Nature (London) 217, ph/0302223. 511 (1968). [208] A. Curto, J. F. Macias-Perez, E. Martinez-Gonzalez, [217] J. R. Fergusson and E. P. S. Shellard, ArXiv e-prints R. B. Barreiro, D. Santos, F. K. Hansen, M. Liguori, (2008), 0812.3413. and S. Matarrese, ArXiv e-prints 0804.0136 (2008), [218] E. Komatsu and D. N. Spergel, Phys. Rev. D 63, 0804.0136. 063002 (2001), arXiv:astro-ph/0005036. [209] A. P. S. Yadav and B. D. Wandelt, Physical Review [219] N. Kogo and E. Komatsu, Phys. Rev. D 73, 083007 Letters 100, 181301 (2008). (2006), arXiv:astro-ph/0602099. [210] E. Martinez-Gonzalez, ArXiv e-prints 0805.4157 [220] D. Babich, Phys. Rev. D 72, 043003 (2005), (2008), 0805.4157. arXiv:astro-ph/0503375. [211] J. Jasche, F. S. Kitaura, and T. A. Ensslin, ArXiv e- [221] A. F. Heavens, MNRAS 299, 805 (1998), arXiv:astro- prints (2009), 0901.3043. ph/9804222. [212] P. Coles and B. Jones, MNRAS 248, 1 (1991). [222] P. Creminelli, A. Nicolis, L. Senatore, M. Tegmark, [213] R. Vio, P. Andreani, and W. Wamsteker, PASP 113, and M. Zaldarriaga, Journal of Cosmology and Astro- 1009 (2001), arXiv:astro-ph/0105107. Particle Physics 5, 4 (2006), arXiv:astro-ph/0509029. [214] M. C. Neyrinck, I. Szapudi, and A. S. Szalay, ArXiv [223] A. P. Yadav and B. D. Wandelt, Phys. Rev. D 71, e-prints (2009), 0903.4693. 123004 (2005), arXiv:astro-ph/0505386. [215] R. K. Sachs and A. M. Wolfe, Astrophys. J. 147, 73