<<

The Millennium-XXL Project: Simulating the Population of dark Energy

Modern cosmology as encoded in the In particular, the arrival of the largest leading ⌳CDM model confronts astrono- galaxy surveys ever made is imminent, mers with two major puzzles. One is that offering enormous scientific potential the main matter component in today’s for new discoveries. Experiments like appears to be a yet undiscov- SDSSIII/BOSS or PanSTARRS have ered elementary particle whose contribu- started to scan the sky with unprec- tion to the cosmic density is more than edented detail, considerably improving 5 that of ordinary baryonic matter. the accuracy of existing cosmological This cold (CDM) interacts probes. This will likely lead to chal- extremely weakly with regular atoms lenges of the standard ⌳CDM paradigm Applications and photons, so that gravity alone has for cosmic , and affected its distribution since very early perhaps even discover new physics. times. The other is a mysterious dark en- ergy force field, which dominates the en- One of the most important aims of ergy content of today’s Universe and has these galaxy surveys is to shed led to its accelerated expansion in recent on the of the dark energy via times. In the standard model, this com- measurements of the redshift evolution ponent is described in terms of Einstein’s of its equation of state. However, the cosmological constant (‘⌳’). Uncovering ability of these surveys to achieve this the nature of dark energy has become major scientific goal crucially depends one of the most actively pursued goals of on an accurate understanding of sys- obser vational cosmology. tematic effects and on a precise way

20 Autumn 2010 • Vol. 8 No. 2 • inSiDE to physically model the observations, The State of the Art in particular the scale-dependent bias The N-body method for the collision- between luminous red and the less dynamics of dark matter is a long- underlying dark matter distribution, or established computational technique the impact of mildly non-linear evolution used to follow the growth of cosmic on the so-called baryonic acoustic oscil- structure through gravitational instabil- lations (BAOs) measured in the power ity. The Boltzmann-Vlasov system of spectrum of galaxy clustering. equations is here discretized in terms of N fiducial simulation particles, whose Simulations of the galaxy formation pro- motion is followed under their mutual cess are arguably the most powerful gravitational forces in an expanding technique to accurately quantify and background space-. While concep- understand these effects. However, tually simple, calculating the long-range this is an extremely tough computa- gravitational forces exactly represents tional problem, because it requires an N2-problem, which quickly becomes ultra-large volume N-body simulations prohibitively expensive for interesting with sufficient mass resolution to iden- problem sizes. However, it is fortu- tify the halos likely to host the galaxies nately sufficient to calculate the forces Applications seen in the surveys, and a realistic approximately, for which a variety of model to populate these halos with algorithms have been developed over galaxies. Given the significant invest- the years. ments involved in the ongoing galaxy surveys, it is imperative to tackle these This allowed the sizes of cosmological Figure 1: Dark matter numerical challenges to ensure that ac- simulations to steadily increase since distribution in the curate theoretical predictions become the early 1980s, roughly doubling the MXXL simulation on available both to help to quantify and particle number every 17 months and different scales. Each panel shows the pro- understand the systematic effects, and hence providing progressively more jected density of dark to extract the maximum amount of in- faithful models for the real Universe [1, matter in a slice of formation from the observational data. 2, 3, 4]. Such simulations have proven thickness 20 Mpc.

Autumn 2010 • Vol. 8 No. 2 • inSiDE 21 to be an indispensable tool to under- than 1010 particles, exceeding the size stand the low- and high-redshift Uni- of previous simulations by almost an verse by comparing the predictions of order of magnitude. Its success was CDM to observations, since these cal- not only computational but most im- culations are the only way to accurately portantly scientific – more than 300 calculate the outcome of non-linear research articles in the fields of theo- cosmic structure formation. retical and observational cosmology have used the MR data-products since. A milestone in the development of cos- The MR has an exquisite mass resolu- mological simulations was set by the tion and accuracy but, unfortunately, (MR), performed by its volume is insufficient to get reliable our group in 2004 statistics on large scales at the level [3]. This simulation was the first, and needed for future surveys. for many years the only run with more

Applications

Figure 2: Differential halo abundance as a function of mass at the present epoch in the MXXL simulation. Note the good sampling far into the exponential tail. Apart from the last point, the error bars from counting statistics are smaller than the plotted symbols.

22 Autumn 2010 • Vol. 8 No. 2 • inSiDE Recently, French and Korean collabora- The computational Challenge tions [5, 6] have successfully carried However, it was clear from the start out simulations containing almost 70 that performing a simulation with these billion particles, but at considerably characteristics poses severe challenges, worse mass and spatial resolution than involving raw execution time, scalability the MS, which did not allow them to of the algorithms employed, as well as robustly identify Milky-Way sized halos. their memory consumption and the disk Also, the need to manipulate and ana- space required for the output data. For lyze a huge volume of data has proven example, simply storing the positions to be a non-trivial challenge in work- and velocities of the simulation particles ing with simulations of this size, a fact in single precision consumes of order that has made scientific exploitation of 10 TB of RAM memory. This figure, of these projects difficult. course, is greatly enlarged by the extra memory required by the complex data We have therefore set out to perform structures and algorithms employed in a new ultra-large N-body simulation the simulation code for the force calcu- of the hierarchical clustering of dark lation, domain decomposition, and halo matter, featuring a new strategy for and subhalo finding. Applications dealing with the data volume, and com- bining it with semi-analytical modelling The code we used is a novel version of galaxy formation, which allows a of GADGET-3 we wrote specifically for prediction of all the luminous proper- the MXXL project. GADGET computes ties of the galaxies that form in the short-range gravitational forces with a simulation. We designed the simula- hierarchical tree algorithm, and long- tion project, dubbed Millennium-XXL range forces with a FFT-based particle- (MXXL), to follow more than 303 billion mesh scheme [7]. Both the force particles (67203) in a cosmological box computation and the time stepping of of size 4.2 Gpc across, resolving the GADGET are fully adaptive. The code is cosmic web with an unprecedented written in highly portable C and uses a combination of volume and resolution. spatial domain decomposition to map While the particle mass of the MXXL different parts of the computational do- is slightly worse than that of the MR, main to individual processors. In 2005, its resolution is sufficient to accurately we publicly released GADGET-2, which measure dark matter merger histories presently is the most widely employed for halos hosting luminous galaxies, code for cosmic structure formation. within a volume more than 200 times larger than that of the MS. In this way Ultimately, memory requirements are the simulation can provide extremely the most serious limiting factor for cos- accurate statistics of the large-scale mological simulations of the kind stud- structure of the Universe by resolv- ied here. We have therefore put sig- ing around 500 million galaxies at the nificant effort in developing algorithms present epoch, allowing for example and strategies that minimize memory highly detailed clustering studies based consumption in our code, while at the on galaxy or samples selected same time retaining high integration in a variety of different ways. This com- accuracy and calculational speed. prehensive information is indispensable The special “lean” version of our code for the correct analysis of future ob- we developed for MXXL requires only servational datasets. 72 bytes per particle in the peak for the

Autumn 2010 • Vol. 8 No. 2 • inSiDE 23 ordinary dynamical evolution of MXXL. for our GADGET code. Instead of using The sophisticated in-lined group and 12,288 distributed-memory MPI substructure finders add a further 26 tasks, we only employed one MPI bytes to the peak memory consumption. task per processor (3,072 in total), This means that the memory require- exploiting the 4 cores of each socket ment for the target size of 303 billion via threads. This mixture of distrib- particles of our simulation amounts to uted and shared memory parallelism slightly more than 29 TB of RAM. proved ideal to limit memory and work-load imbalance losses, because The JuRoPa machine at the Jülich Su- a smaller number of MPI tasks re- percomputing Centre (JSC) appeared duces the number of independent as one of the best suited supercom- spatial domains in which the volume puters within to fulfill our needs to be decomposed, as well as computing requirements, thanks to its the data volume that needs to be available storage of 3 GB per compute shuffled around with MPI calls. The core. Still, on JuRoPa the simulation thread-based parallelization itself demanded a partition of 1,536 nodes, was partially done via explicit Posix Applications each equipped with two quad-core thread programming, and partially X5570 processors and 24 GB of via OpenMP directives for the simpler RAM, translating to 12,288 cores parallel constructs. The Posix-threads in total. This represents a substantial approach has allowed us to achieve fraction (70%) of the whole supercom- effectively perfect parallelizability puter installation. In fact, normal user of the gravitational tree-walk in our operation on such a large partition code, which dominates the computa- is still unusual and not done on a regular tional cost. basis yet. It hence required substantial support on the side of the system Still, the total CPU time required for administrators at JSC to nevertheless the simulation was substantial. This allow us carry out the MXXL production originates from both, the huge num- calculation on JuRoPa. In particular, ber of particles and the very dissimi- severe problems with the memory lar time scales and matter densities usage of our simulation were involved in the problem. Very different encountered initially, as the code density configurations are encoun- required essentially all the available tered in the clustered state produced physical memory of the compute by the simulation, and consequently nodes, leaving very little room for there are structures featuring a large memory buffers allocated by the variety of timescales. Our code tack- MPI library or the parallel Lustre les this problem by using spatially and filesystem attached to JuRoPa. temporally adaptive time-stepping, Fortunately, with help from JSC and so that short time-steps are used ParTec, the software maker behind only when particles enter the local- the MPI software stack on JuRoPa, ized dense regions where dynamical these problems could be overcome. times are short. In addition, we use a new domain decomposition scheme For the MXXL simulation, we decided that produces almost ideal scaling on to try a new hybrid parallelization massively parallel . In spite scheme that we recently developed of these optimizations, the scope of the

24 Autumn 2010 • Vol. 8 No. 2 • inSiDE computational problem remained huge. and groups catalogues, a prohibi- The final production run of MXXL tively large amount of data. To cope required more than 86 trillion force with this problem, we have therefore calculations and took slightly more carried out a significant part of the than 2.7 million CPU hours (~300 post-processing on-the-fly during the years), corresponding to about 9.3 simulation, avoiding the need to store days of runtime on 12,288 cores. the particle phase-space variables for most of the output times on disk. This A significant fraction of 15% of the postprocessing included the identifi- time was however spent for running cation and characterization of halos our sophisticated on-the-fly postpro- and subhalos, and the calculation of cessing software, notably the group the dark matter density field and its finding, the substructure finding, and time derivative. As a result, the total the power spectrum calculation, and disk space required for the data prod- another 14% were needed for the I/O ucts of the simulation could be shrunk operations. The long-range force to about 100 TB. calculations based on 92163 sized FFTs consumed only about 3% of First Results and Outlook Applications the time. We note that the parallel In Figure 1, we show the projected “friends-of-friends” group finding for density field of the MXXL simulation the 303 billion particles at the final in a series of thin slices through the output time took just 470 seconds, simulation volume. This gives a hint which we think is a remarkable about the enormous statistical resolv- achievement. Determining all gravi- ing power of the simulation, and its tational bound substructures within large dynamic range everywhere in these halos took a further 2,600 the simulation cube. The gravitational seconds. It is also during this substruc- spatial resolution of the simulation ture finding that the peak memory translates to a dynamic range of consumption of the code is reached. 300,000 per dimension, or formally more than 1016 resolution elements Another demanding aspect of the in 3D. MXXL project lies in the data han- dling. The production code is able to We have found 650 million halos at write data parallel to disk. Despite redshift zero in the simulation, bind- the huge size of our data sets, the ing 44% of all particles in non-linear restart files, amounting to 10 TB in structures. In total there are more total, could be written/read to disk than 25 billion halos in the group in about 40 minutes on the JuRoPa catalogues that we produced and that system. However, of larger practi- are linked together in our merger cal importance for us is the aggre- trees. In Figure 2, we show the halo gated size of the simulation outputs mass function at the present epoch, produced by the simulation. If an ap- which is one of the most basic and proach similar to the MR was taken important simulation predictions, de- for the MXXL project, then the disk scribing the abundance of objects as space requirement would approach a function of mass and epoch. 700 TB for the storage of the dark matter particle positions, velocities

Autumn 2010 • Vol. 8 No. 2 • inSiDE 25 The largest cluster of galaxies at z=0 The full analysis of the MXXL in terms has a mass of 9 x 1015 solar masses. of its predicted galaxy distributions Such extreme objects are so rare has just begun, and will still take that they can only be found in volumes some time to complete. We can, for as large as that of the MXXL. In Fig- example, use the MXXL to predict the ure 3, we show a map of the time de- spatial distribution of galaxies in the rivative of the gravitational potential past light-cone of a fiducial observer. of the simulation at redshift z=12. We Without any replication of the box, produced such maps on-the-fly during we will be able to construct all-sky the simulation in order to study the synthetic galaxy catalogues up to integrated Sachs-Wolfe effect, which redshift 0.57. These mock observa- describes distortions in the cosmic tions are crucial for future surveys to microwave background by foreground study cluster finding algorithms, the structures. influence of projection effects, the evolution of the clustering along the line-of-sight, and most importantly,

Applications

Figure 3: Map of the time-derivative of the gravitational potential at z=12 in the MXXL simulation. Such maps can be used to calculate the integrated Sachs- Wolfe effect along the backwards light-cone to the last scattering surface of the cosmic microwave background.

26 Autumn 2010 • Vol. 8 No. 2 • inSiDE the systematic effects in the cos- world-class HPC resources, but the mological constraints derived from expected rich scientific return from BAO, weak lensing, or cluster counts. the project makes it well worth the Using a novel scaling technique we effort. At the same time, the suc- just developed [8], it is furthermore cessful scaling of the cosmological possible to cast the simulation results code to well beyond ten-thousand into theoretical models for the galaxy cores is an encouraging sign for distribution that cover a significant computational research in cosmol- range of cosmologies. ogy, which will address yet more demanding problems on future HPC The Millennium XXL is by far the larg- machines. est cosmological simulation ever per- formed and the first multi-hundred References billion particle run. The scope of the [1] Davis, M., Efstathiou, G., Frenk, C. S., and White, S. D. M. simulation project has pushed the en- The evolution of large-scale structure in a velope of what is feasible on current universe dominated by cold dark matter,1985, Astrophysical Journal, 292, 371 Applications [2] Navarro, J. F., Frenk, C. S., White, S. D. M. 1996, The Structure of Cold Dark Matter Halos, Astrophysical Journal, 462, 563

[3] Springel, V., White, S. D. M., Jenkins, A. et al. 2005, Simulations of the formation, evolution and clustering of galaxies and , Nature, 435, 629

[4] Springel, V., Wang, J., Vogelsberger, M., • Volker Springel1 et al. • Raul Angulo, 2008, The Aquarius Project: the subhaloes Simon D.M. White 2 of galactic haloes, Monthly Notices of the Royal Astronomical Society, 391, 1685 • Adrian Jenkins, Carlos S. Frenk, [5] Springel, V. The cosmological simulation code GADGET-2, Carlton Baugh, 2005, Monthly Notices of the Royal Shaun Cole3 Astronomical Society, 364, 1105

1 [6] Teyssier, R., Pires, S., Prunet, S., et al. Heidelberg Institute 2009, Full-sky weak-lensing simulation with for Theoretical 70 billion particles, Astroparticle Physics, Studies, Heidelberg 497, 335

[7] Kim, J., Park, C., Gott, J. R., Dubinski, J. 2 Max-Planck-Institute 2009, The Horizon Run N-Body Simulation: for , Baryon Acoustic Oscillations and Topology of Garching, Germany Large-scale Structure of the Universe, Astrophysical Journal, 701, 1547 3 Institute for [8] Angulo, R. E., White, S. D. M. Computational 2010, One simulation to fit them all - changing the background parameters of Cosmology, a cosmological N-body simulation, Monthly Uni versity of Notices of the Royal Astronomical Society, Durham, 405, 143

Autumn 2010 • Vol. 8 No. 2 • inSiDE 27