The Information Content of Galaxy Surveys

DISS. ETH NO. 20642 The information content of galaxy surveys A dissertation submitted to ETH ZURICH for the degree of Doctor of Sciences presented by Julien Carron MSc ETH in Physics born on Dec 26th, 1985 citizen of Sion accepted on the recommendation of Prof. Dr. Simon Lilly, examiner Dr. Adam Amara, co-examiner arXiv:1406.6095v1 [astro-ph.CO] 23 Jun 2014 Prof. Dr. Luca Amendola, co-examiner 2012 Contents Contents vi List of Figures viii List of Tables ix Abstract xi Résumé xiii 1 Introduction 1 1.1 Stochasticity in cosmological observables . .2 1.2 Fisher information for cosmology : a first look . .3 1.3 N-point functions . .6 1.3.1 N-point functions from the density : characteristic functional . .6 1.3.2 The density from N-point functions : determinacy of the moment problem . .9 1.3.3 N-point functions from discrete populations, poisson samples . .9 1.4 Gaussian and non-Gaussian matter density fields . 11 iii CONTENTS 1.5 Structure of the thesis . 15 I Quantifying information 17 2 Shannon entropy and Fisher information 19 2.1 Fisher information and the information inequality . 20 2.1.1 The case of a single observable . 21 2.1.2 The general case . 23 2.1.3 Resolving the density function: Fisher information density . 24 2.2 Jaynes Maximal Entropy Principle . 26 2.2.1 Information in maximal entropy distributions . 29 2.2.2 Redundant observables . 31 2.2.3 The entropy and Fisher information content of Gaussian homoge- neous fields . 32 2.3 Appendix . 36 2.3.1 Measures of information on a parameter . 36 2.3.2 Cramér-Raobound and maximal entropy distributions . 39 3 Information within N-point moments 43 3.1 One variable . 44 3.1.1 The Christoffel-Darboux kernel as information on the density itself . 47 3.2 Several variables . 48 3.2.1 Independent variables and uncorrelated fiducial . 50 3.2.2 On the impact of noise . 51 3.2.3 Poissonian discreteness effects . 53 3.3 Some exactly solvable models for moment determinate densities . 53 3.3.1 Moment indeterminate densities . 60 iv CONTENTS 3.4 Appendix . 66 3.4.1 Derivation for the beta family . 66 3.4.2 General hierarchical systems, recursion relations. 70 II Applications for cosmology 73 4 On the combination of shears, magnification and flexion fields 75 4.1 Introduction . 75 4.2 Linear tracers . 76 4.2.1 Linear tracers at the two-point level . 79 4.2.2 Joint entropy and information content . 79 4.2.3 Weak lensing probes . 81 4.3 Results . 83 4.4 Appendix . 86 5 On the use of Gaussian likelihoods for power spectra estimators 91 5.1 Introduction . 91 5.2 One field, gamma distribution . 92 5.3 Several fields, Wishart distribution . 95 5.4 Summary and conclusions . 98 6 N-point functions in lognormal density fields 101 6.1 Introduction . 101 6.1.1 Notation and conventions . 103 6.1.2 Definition and basic properties of correlated lognormal variables . 103 6.2 Fundamental limitations of the lognormal assumption . 104 6.2.1 Largest scales . 105 v CONTENTS 6.2.2 Smallest scales . 106 6.3 Explicit fields with the same hierarchy of N-point functions . 108 6.3.1 The problem with tailed fields . 108 6.3.2 Continuous family . 111 6.3.3 Discrete family . 113 6.3.4 Appendix . 114 6.4 Information at all orders for the one dimensional distribution . 115 6.4.1 Lack of information in the moments . 118 6.4.2 A q-analog of the logarithm . 119 6.4.3 Comparison to standard perturbation theory . 122 6.4.4 Appendix . 123 6.5 Connection to N-body simulations . 126 6.6 Information at all orders for uncorrelated fiducial . 130 6.6.1 Cumulative efficiencies . 132 6.6.2 Impact of discreteness effects . 133 6.6.3 The reason for a roughly parameter independent factor of improve- ment ................................... 137 6.7 The general case : a difficult Lagrange interpolation problem . 139 7 Information escaping the correlation hierarchy of the convergence field141 7.1 Introduction . 141 7.2 Fisher information coefficients . 143 7.3 Restoration of the information . 146 7.4 Conclusions . 147 Summary of main results and outlook 149 Bibliography 162 vi List of Figures 1.1 The shape of the one-point distribution function of the matter density fluctuation field according to second order perturbation theory. 12 1.2 Evolution of the matter density field distribution with cosmic time in a N-body simulation. Image courtesy of A. Pillepich . 14 3.1 The cumulative efficiencies of the moments of the lognormal distribution to capture its information content . 62 3.2 The cumulative efficiencies of the moments of the Weibull distribution to capture its information content . 64 3.3 The cumulative efficiencies of the moments of the stretched exponential function to capture its information content . 65 4.1 The ratio of the effective noise when including flexion and magnification fields to the level of noise considering the shears only, as a function of the angular scale. 84 4.2 The increase of the dark energy FOM including size information as well as flexion information, over the shear only analysis, as a function of the maximal angular multipole. 87 6.1 The one dimensional chain of points used to test the positivity of the spectrum of the A field, and the apparition modes with negative spectrum on small scales, indicating that the lognormal field is no longer a well defined assumption for a ΛCDM cosmology. 106 6.2 Three density functions with the same moments as those of the lognormal distribution at all orders. 112 vii LIST OF FIGURES 6.3 The fraction of the total information content that is accessible through the extraction of the entire series of moments of the lognormal field, as function of the square root of the variance of the fluctuations. 120 6.4 Distribution of the information within the moments of the lognormal distribution . 120 6.5 The information density of the lognormal distribution and its approximation through the associated orthonormal polynomials, for fluctuations of unit variance . 121 6.6 Comparison of the information content of the first moments of the fluctuation field between standard perturbation theory and the lognormal model . 124 6.7 Comparison of various estimates of the error bar on the linear power spectrum amplitude, constrained using the power spectra of the overdensity field or of the log-density field in an N-body simulation. 127 6.8 The fraction of the total information content that is accessible through extraction of the full series of N-point moments of the lognormal field for an uncorrelated fiducial model, for a parameter entering the two-point function at non zero lag, but not the variance. 133 6.9 Discreteness effects destroying the information contained in the underdense regions of the lognormal distribution . 135 6.10 The information content of the first N-point moments in a Poisson sample of a lognormal field, as function of the mean number of objects per cell, normalised to the information content of the lognormal field. 136 6.11 Ratios measuring the sensibility of the spectrum to that of the variance for different cosmological parameters . 138 7.1 The three parameters entering the generalized lognormal model, as function of the variance of the effective matter density fluctuations . 144 7.2 The cumulative efficiencies of the moments of the convergence in capturing Fisher information as function of the variance of the effective matter density fluctuations. 146 7.3 The efficiency N=10 of the first 10 moments to capture Fisher information, for the convergence field and lognormal field, as well as for the logarithmic transform of the convergence field. 147 7.4 The Fisher information density of the lognormal and the convergence field. 148 7.5 The cumulative information on the amplitude of the spectrum within either the spectrum or two-point function of the one dimensional lognormal field. 151 viii List of Tables 4.1 Dispersion parameters for the shear, flexion and magnification fields used in our numerical analysis. 83 4.2 Ratio of the marginalised constraints considering size information as well as flexion to that of shear only analysis. 86 6.1 The factors of improvements in constraining power of the spectrum of the log-density field in the lognormal model and as seen in simulations . 130 ix LIST OF TABLES x Abstract This research is a contribution to our understanding of the information content of the cosmological dark matter density field, and of the means to extract this information. These questions are of prime importance in order to reach closer for solutions to current fundamental issues in cosmology, such as the nature of dark matter and dark energy, that future large galaxy surveys are aiming at. The focus is on a traditional class of observables, the N-point functions, that we approach with known information theoretic tools, Fisher information and Shannon information entropy. It is built out of two main parts, the first presenting in details the mathematical methods we used and introduced, and the second the cosmological research that was performed with these tools. A large fraction of this thesis is dedicated to the study of the information content of random fields with heavy tails, in particular the lognormal field, a model for the matter density fluctuation field. It is well known that in the nonlinear regime of structure formation, the matter fluctuation field develops such large tails. It has also been suggested that fields with large tails are not necessarily well described by the hierarchy of N-point functions.

The Information Content of Galaxy Surveys

Information Content and Error Analysis

Chapter 4 Information Theory

Shannon Entropy and Kolmogorov Complexity

Understanding Shannon's Entropy Metric for Information

Information, Entropy, and the Motivation for Source Codes

Entropy in Classical and Quantum Information Theory

An Introduction to Information Theory and Entropy

Introduction to Information Theory

Exploring Deep Learning Using Information Theory Tools and Patch Ordering

Interpretable Convolution Methods for Learning Genomic Sequence Motifs

Lecture 1: August 29 1.1 About the Class

Information Diffusion Prediction Via Recurrent Cascades Convolution