Indra: a Public Computationally Accessible Suite of Cosmological 푁-Body Simulations
Total Page:16
File Type:pdf, Size:1020Kb
MNRAS 000,1–12 (2021) Preprint 5 August 2021 Compiled using MNRAS LATEX style file v3.0 Indra: a Public Computationally Accessible Suite of Cosmological #-body Simulations Bridget Falck,1¢ Jie Wang,2,3 Adrian Jenkins,4 Gerard Lemson,1 Dmitry Medvedev,1 Mark C. Neyrinck,5,6 Alex S. Szalay1 1Department of Physics and Astronomy, Johns Hopkins University, 3400 N Charles St, Baltimore, MD 21218, USA 2National Astronomical Observatories, Chinese Academy of Sciences, Beijing, 100012, China 3School of Astronomy and Space Science, University of Chinese Academy of Sciences, Beijing 100049, China 4Institute for Computational Cosmology, Department of Physics, Durham University, Durham DH1 3LE, UK 5Ikerbasque, the Basque Foundation for Science, 48009, Bilbao, Spain 6Department of Physics, University of the Basque Country UPV/EHU, 48080, Bilbao, Spain Accepted XXX. Received YYY; in original form ZZZ; Draft version 5 August 2021 ABSTRACT Indra is a suite of large-volume cosmological #-body simulations with the goal of providing excellent statistics of the large-scale features of the distribution of dark matter. Each of the 384 simulations is computed with the same cosmological parameters and different initial phases, with 10243 dark matter particles in a box of length 1 ℎ−1 Gpc, 64 snapshots of particle data and halo catalogs, and 505 time steps of the Fourier modes of the density field, amounting to almost a petabyte of data. All of the Indra data are immediately available for analysis via the SciServer science platform, which provides interactive and batch computing modes, personal data storage, and other hosted data sets such as the Millennium simulations and many astronomical surveys. We present the Indra simulations, describe the data products and how to access them, and measure ensemble averages, variances, and covariances of the matter power spectrum, the matter correlation function, and the halo mass function to demonstrate the types of computations that Indra enables. We hope that Indra will be both a resource for large-scale structure research and a demonstration of how to make very large datasets public and computationally-accessible. Key words: dark matter – large-scale structure of Universe – methods: numerical 1 INTRODUCTION this observational volume. We can, however, simulate an ensemble of universes, increasing the precision with which we can numerically Understanding the observations made by large-scale structure sur- predict the statistical properties (e.g., the covariance matrix) of the veys such as Euclid, Roman, DESI, and the VRO requires input matter distribution on large scales for a given cosmological model. from theoretical predictions of structure formation in the form of numerical simulations. Cosmological #-body simulations solve for Ideally, one would want to simulate the full volume and resolu- the nonlinear gravitational collapse of matter and are indispensable tion that the next generation of galaxy surveys will attain, repeat this tools both for testing cosmological models and for planning for, and thousands of times for different realizations of the same cosmological analyzing, observations. For example, simulations are necessary to model and parameters, run this ensemble for thousands of combina- predict the large-scale structure observables of dark energy and mod- tions of cosmological parameters in a given model, and repeat the arXiv:2101.03631v2 [astro-ph.CO] 3 Aug 2021 ified gravity theories in the non-linear regime (e.g Joyce et al. 2016; entire process for any number of cosmological models one expects Winther et al. 2015), which can be compared to simulations of the the observations to test. Such a scenario has obvious drawbacks: it current ΛCDM (cold dark matter with a cosmological constant, Λ) would require a prohibitively expensive amount of computing power cosmological paradigm. Simulations have also become an integral and produce an excessive amount of data which would be impossible part of the analysis of measurements of baryon acoustic oscillations to analyze without equally massive computing resources. Because (BAO) from galaxy redshift surveys (e.g. Percival et al. 2014). of these immense technical challenges, workarounds are being de- One of the pressing challenges when making inferences from large vised: cosmological emulators (Heitmann et al. 2014, 2016; Garrison observational surveys is that we can only observe the Universe from et al. 2018; DeRose et al. 2019; Nishimichi et al. 2019) attempt to one vantage point in a finite volume. This “cosmic variance” places span cosmological parameter space with a few full simulations and limits on the precision with which we can measure statistical fluc- interpolate between them; low-resolution particle mesh simulations tuations on very large scales. This also presents a problem for sim- attempt to capture features only at large scales (Takahashi et al. 2009; ulations: we cannot simulate exactly the positions of the galaxies in Blot et al. 2015); approximate simulations combine numerical short- cuts with analytic solutions to produce fast and cheap ensembles of realizations with a limited range of accuracy (see Chuang et al. 2015; ¢ E-mail:[email protected] Lippich et al. 2019, and references therein); and analytic methods at- © 2021 The Authors 2 Falck, et al. tempt to model the statistics in order to reduce the number of full shifts and merger histories of dark matter halos are able to be built. realizations required to obtain the same precision (Pope & Szapudi We describe the simulations and their data products in Section2, 2008; Taylor et al. 2013; Dodelson & Schneider 2013; Heavens et al. including details on how the initial conditions were created and an 2017). issue that affected the first 128 simulations. In Section3, we measure This paper aims to address the technical challenges head-on by ensemble averages, variances, and covariances from the three Indra producing a suite of cosmological #-body simulations called Indra1 data products: the dark matter particles, the halo catalogs, and the hosted on SciServer2, a science platform that allows the community Fourier modes of the density field. In the appendix we describe how to perform analysis where the data are stored. This method of server- the full suite of simulations can be accessed and analyzed by the side analysis was pioneered in astronomy by the Sloan Digital Sky community. Survey (SDSS) SkyServer (Szalay et al. 2000), which enabled as- tronomers to interactively explore a multi-terabyte database by lever- aging state-of-the-art archiving technologies. Modelled on the suc- 2 THE INDRA SIMULATIONS cess of the SkyServer, the Millennium Simulation Database (Lemson & Virgo Consortium 2006) made available the post-processing out- 2.1 Overview puts of the Millennium simulation (Springel et al. 2005), including The Indra suite of simulations were run using the code L- halo catalogs, merger trees, and later, semi-analytic galaxy cata- Gadget2 (Springel 2005), each with a box length of 1 ℎ−1 Gpc, 10243 logs (see also Riebe et al. 2011; Loebman et al. 2014; Bernyk et al. particles, and WMAP7 cosmological parameters (Ω" = 0.272, 2016). The dark matter particle positions and velocities, however, ΩΛ = 0.728, ℎ = 0.704, f8 = 0.81, =B = 0.967)(Komatsu were not in the database, as their large size (roughly 20 TB) and us- et al. 2011). They are purely gravitational dark-matter-only cos- age patterns (being point clouds instead of object catalogs) presented mological simulations; each dark matter particle has a mass of significant technical challenges. 10 −1 −1 < ? = 7.031×10 ℎ M . The force softening length is 40 ℎ kpc, Though database technology has seen significant advances in the and the parameters used for the force accuracy, time integration, and past decade, our early efforts to build a relational database for In- tree criteria are the same as used for the Millennium Run (Springel dra were motivated by reducing the number of rows in a table – the et al. 2005), which used the same lean version of Gadget. Initial planned full suite of 512 simulations amounts to 35 trillion output conditions were generated using second-order Lagrangian perturba- particles. We considered storing particles in spatially-sorted array- tion theory (2LPT, Scoccimarro 1998; Jenkins 2010), with pre-initial like “chunks” (Dobos et al. 2012) and developed an inverted indexing conditions on a regular grid and a starting redshift of I = 127.3 64 scheme (Crankshaw et al. 2013) to access individual particles in these snapshots are saved down to I = 0 at the same redshifts as the Mil- arrays (e.g. to track their movement across many snapshots). How- lennium Run. In general the box size, particle number, and other ever, many technical advances opened up the capability to release parameters of Indra are geared toward resolving halos more massive data in their native binary file format, without needing to first load than that of the Milky Way and obtaining good statistics on BAO them into relational databases, by mounting the data to contained scales, while saving snapshots at a large number of redshifts and not compute environments running on virtual machines. Indra takes ad- exceeding a petabyte of data. Since Indra is intended to be a public vantage of the SciServer science platform, built at Johns Hopkins resource, we hope it can serve a variety of cosmological applications. University where the data are stored, which provides interactive and The Indra simulations can be identified using three digits that batch mode server-side analysis in various environments, public and range from 0 to 7, which we denote as x_y_z, corresponding to the private file storage in both flat files and databases, and many collab- coordinates of an 8 × 8 × 8 cube. Note however that each simulation oration tools (Taghizadeh-Popp et al. 2020). is fully independent and obeys periodic boundary conditions in a 1 Recent advances have seen some tera- and peta-scale cosmological ( ℎ−1 Gpc)3 volume; see Section 2.3 for details about how the initial simulation suites that provide their data for download, but with no phases were set up.