Observing the Virtual Universes of Cosmological N-Body Simulations

Observing the virtual universes of cosmological N-body simulations Volker Springel Analysing large N-body simulations the data volume challenge merger tree construction: a potential grid application observing galaxies with semi-analytic models publication of the data in a theoretical VO Backwards light cone observations virtual galaxy redshift catalogues gravitational lensing maps secondary anisotropies induced in the CMB Future challenges EURO-VO Workshop on Theory in the Virtual Observatory Garching, April 2008 The largest N-body simulations cover a sizable fraction of the observable universe DARK MATTER CLUSTERING ALONG THE PAST LIGHT-CONE OF THE HUBBLE SIMULATION `Hubble-volume' simulation Virgo Consortium (1999) CDM 1.000.000.000 particles m=2.2 x 1012 M /h Evrard et al. (2001) ⊙ 'Millennium' simulation Springel et al. (2005) CDM 10.077.696.000 particles 8 m=8.6 x 10 M⊙/h Millennium Run 10.077.960.000 particles Springel et al. (2004) Max-Planck Institut für Astrophysik The simulation produced a multi-TByte data set RAW SIMULATION OUTPUTS Data size One simulation timeslice we have stored Raw data volume 360 GByte 64 outputs 23 TByte Design for structure of snapshot files The particles of each subfile A hash-table is produced for are stored in the sequence of each file of a snapshot. Each a 323 Peano-Hilbert grid that entry gives the offset to the covers the volume first particle in corresponding 3 represented by the file. On cell of the 32 grid, relative to average, 600 particles per the beginning of the file. grid-cell. Size of tables: 512 x 128 Kb = 64 MB Store simulation data in 83 = 512 files which map to subcubes in the simulation volume. Allows random access to Each file has ~ 20 million particles, 600 MB. particle data of subvolumes. FoF group catalogues Structure of 64-bit particle key Are computed on the fly 15 bit 9 bit 34 bit Group catalogue: Length of Hash-Key File-Key Particle-ID each group and offset into particle list Long list of particle keys (64 Allows fast selective access to all bit) that make up each group particles of a given group Analysis of many simulation outputs allows a measurement of the hierarchical build up of dark matter halos FOLLOWING DARK MATTER IN TIME Merger tree of a cluster (only progenitors above a minimum mass are shown) The formation time of halos depends strongly on mass and indicates a hierarchical formation of objects AVERAGE FORMATION TIME OF HALOS AS A FUNTION OF HALO MASS How can it be that the most massive ellipticals are also the oldest and reddest? Are the statistical properties of galaxies in a halo only a function of halo mass? Spatial distribution of the 20% most recently formed halos of a given mass 11 Mhalo ~ 10 M⊙ slice of 30 Mpc/h thickness Gao, Springel & White (2005) Spatial distribution of the earliest forming 20% of halos of a given mass 11 Mhalo ~ 10 M⊙ slice of 30 Mpc/h thickness Gao, Springel & White (2005) The bias of halos of a given mass increases smoothly with formation redshift BIAS AS A FUNCTION OF MASS AND FORMATION TIME Gao, Springel & White (2005) as a function of formation time as a function of mass 20 % oldest mean 20 % youngest 11 12 Mhalo = 10 M⊙/h M* = 6 x 10 M⊙/h “Assembly bias”: This dependence is inconsistent both with excursion set theory and HOD Halos formed in high-resolution simulations of cold dark matter show rich substructure SUBHALOS IN A RICH CLUSTER ~ 20 million particles within virial radius of cluster Springel, White, Kauffmann, Tormen (2000) Even in the central regions, substructures can still be found SUBHALOS AROUND A CLUSTER CENTRE ~ 20 million particles within virial radius of cluster Springel, White, Kauffmann, Tormen (2000) Subhalo finding (SUBFIND) Finding dark matter satellites in simulations is a non-trivial task AN ALGORITHMIC TECHNIQUES FOR SUBHALO IDENTIFICATION SUBFIND (1) Estimate local DM density field (2) Find locally overdense regions with topological method (3) Subject each substructure candidate to a gravitational unbinding procedure Tracking the fate of Merging tree of subhalos satellite galaxies in simulations is computationally and `logistically' complicated A SKETCH OF A SUBHALO MERGING TREE How do we manage to compute this for a simulation with more than 1010 particles, and more than 20 million halos? Semi-analytic models are currently the most powerful provided by technique to study galaxy formation N-body sim MOST IMPORTANT INPUT PHYSICS Dark matter Radiative gas merging Star formation cooling history tree Spectrophotometric Feedback evolution Hierarchical growth of dark matter halos Morphological Metal understood with high accuracy evolution enrichment Input physics Radiative cooling of gas within Semi- analytic halos (dissipation) in princible well within reach of current simu- machinery lations, yet plagued with numerical difficulties Star formation and associated Tully- Fisher Luminosity feedback processes relation function highly uncertain physics, numerically extremely difficult Star formation Galaxy Galaxy history morphologies Spectrophotometric modeling of colors stellar populations some uncertainties, but no/small Morphology Evolution to coupling to gas dynamics Predictions density high redshift relation Clustering properties Merger tree organization in the Millennium Run The semi-analytic merger-tree in the Millennium Run connects about 800 million subhalos SCHEMATIC MERGER TREE The trees are stored as self-contained objects, which are the input to the semi-analytic code Each tree corresponds to a FOF halo at z=0 (not always exactly) The collection of all trees (a whole forest of them) describes all the structures/galaxies in the simulated universe Legend: Descendant FirstProgenitor NextProgenitor Time Halo FirstHaloInFOFGroup FOF Group NextHaloInFOFGroup Postprocessing of the simulation data requires effcient analysis codes VARIOUS POSTPROCESSING-TASKS Things done on the fly by the simulation code FoF group finding Power spectrum and correlation function measurement Tasks carried out as true postprocessing Substructure finding and halo/subhalo properties Done by L-SubFind in massiv parallel mode With 32 CPU/256 GB (chubby queue) can process one clustered snapshot in ~4-5 hours Construction of merger history trees Lines of C-Code Two step procedure. L-BaseTree finds halos descendants in future L-GenIC 1900 snapshots, thereby providing horizontal links in the merger tree. L-Gadget2 12000 Serial/OpenMP-parallel, requires ~200 GB shared RAM, fast. L-SubFind 3000 In a second step, L-HaloTrees builds up fully threaded vertical trees for L-BaseTree 700 each halo. These are the input objects for the semi-analytic code. L-HaloTrees 900 Semi-analytic galaxy formation L-Galaxies 2600 L-HsmlFind 1800 New semi-analytic code L-Galaxies, can be run in massively parallel L-Picture 1600 mode on the merger trees generated for the Millennium Run. Interfacing with VO databases is in preparation. Lines 24500 Data visualization Characters 580000 Challenging because of the data volume. L-HsmlFind (massively parallel) determines dark matter adaptive smoothing lengths, while L-Picture (serial) makes a picture for an arbitrarily inclinded and arbitrarily large slice through the periodic simulation. A family of postprocessing codes is applied to the Millennium simulation to produce the tree-files fed to the semi-analytic code L-HsmlFind finds distance to CODE MACHINERY n-th nearest neighbour for all particles IC files L-Gadget2 L-GenIC Actual simulation code generates initial conditions hsml files FOF groups snapshot files L-Picture computes adaptively smoothed density field for arbitrarily inclined slice subhalos catlogues density slices L-SubFind finds dark matter substructure and determines halo properties L-BaseTree L-TreeAddIDTab L-TreeAddPosTab produces table with all find coordinates and velocities finds descendant of each particle IDs that at one point subhalo of these particle IDs at all output were a most-bound particle times and stores them in table descendant pointers empty auxialiary tree data L-TreeMakeUniqueIDs L-HaloTrees determines unique numbering scheme for halos constructs self-contained trees for all halos at z=0 tree files new halo IDs auxialiary tree data L-Galaxies for a given tree, constructs galaxy population galaxy catalogues ~20 TB, ~200000 files A merger tree containing 800 million dark matter (sub)halos is used to compute semi-analytic models of galaxy formation DARK MATTER AND GALAXY DISTRIBUTION IN A CLUSTER OF GALAXIES The inclusion of AGN feedback allows the semi-analytic model to reproduce a multitude of observational data K-BAND AND Bj-BAND LUMINOSITY FUNCTIONS Croton et al. (2005) The two-point correlation function of galaxies in the Millennium run is a very good power law GALAXY TWO-POINT FUNCTION COMPARED WITH 2dFGRS public data release of the Millennium galaxies on Aug 1st 2006 http://www.mpa-garching.mpg.de/Millennium http://www.mpa-garching.mpg.de/Millennium The publicly available Millennium database has stimulated widespread use of the data in the community ASTRO-PH PREPRINT COUNT UNTIL TODAY Lightcone data products Galaxies on the backwards lightcone Kitzbichler & White (2007) Ray tracing through the Millennium simulation OBTAINING HIGH-RESOLUTION MASS MAPS FOR GRAVITATIONAL LENSING Hilbert, Metcalf & White (2007) 20x20 arcmin2 convergence maps for HI source distribution at z=12 (a), or galaxy redshift survey with median redshift z=1.23 (b,c) We have constructed a new method to produce continuous all-sky maps from the Millennium Run SKETCH OF THE BACKWARDS LIGHT-CONE

Observing the Virtual Universes of Cosmological N-Body Simulations

Gravitational Waves from Resolvable Massive Black Hole Binary Systems

The Illustristng Simulations: Public Data Release

Machine Learning and Cosmological Simulations I: Semi-Analytical Models

Virtu - the Virtual Universe

Mocking the Universe: the Jubilee Cosmological Simulations

Numerical Cosmology: Recreating the Universe in a Supercomputer The

The Millennium Simulation: Cosmic Evolution in a Supercomputer Simon White Max Planck Institute for Astrophysics the COBE Satellite (1989 - 1993)

Exploring the Millennium Run-Scalable Rendering of Large

The Environment of Radio Galaxies and Quasars: Roderik Overzier Z=6 Z=1

Cosmological Simulations Volker Springel

MINING VIRTUAL UNIVERSES (ONLINE) Lectures and Hands-On Sessions at ISSAC 2012 Gerard Lemson MPA, Garching, Germany

The Universe in a Supercomputer