Version July 30, 2021 Preprint typeset using LATEX style openjournal v. 09/06/15

A DIFFERENTIABLE MODEL OF THE ASSEMBLY OF INDIVIDUAL AND POPULATIONS OF HALOS Andrew P. Hearin1,?, Jonas´ Chaves-Montero1,2, Matthew R. Becker1, and Alex Alarcon1 1HEP Division, Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL 60439, USA and 2Donostia International Centre, Paseo Manuel de Lardizabal 4, 20018 Donostia-San Sebastian, Spain Version July 30, 2021

ABSTRACT We present a new empirical model for the mass assembly of dark matter halos. We approximate the growth of individual halos as a simple power-law function of time, where the power-law index smoothly decreases as the halo transitions from the fast-accretion regime at early times, to the slow-accretion regime at late times. Using large samples of halo merger trees taken from high-resolution cosmological simulations, we demonstrate that our 3-parameter model can approximate halo growth with a typical 11 accuracy of 0.1 dex for t & 1 Gyr for all halos of present-day mass Mhalo & 10 M , including subhalos and host halos in gravity-only simulations, as well as in the IllustrisTNG hydrodynamical simulation. We additionally present a new model for the assembly of halo populations, which not only reproduces average mass growth across time, but also faithfully captures the diversity with which halos assemble their mass. Our python implementation is based on the autodiff library JAX, and so our model self- consistently captures the mean and variance of halo mass accretion rate across cosmic time. We show that the connection between halo assembly and the large-scale density field, known as halo assembly bias, is accurately captured by our model, and that residual errors in our approximations to halo assembly history exhibit a negligible residual correlation with the density field. Our publicly available source code can be used to generate Monte Carlo realizations of cosmologically representative halo histories; our differentiable implementation facilitates the incorporation of our model into existing analytical halo model frameworks. Subject headings: Cosmology: large-scale structure of ; methods: N-body simulations

1. INTRODUCTION phase, their concentrations steadily increase as they pre- In the standard cosmological model, the matter con- dominantly pile up mass onto their outskirts (see Wang tent of the Universe is dominated by Cold Dark Matter et al. 2020, for a detailed physical picture of the relation- (CDM), and gravitationally self-bound objects referred ship between halo assembly and NFW concentration). to as dark matter halos are the fundamental building Many additional characteristics of the internal struc- blocks of structure formation (see Mo et al. 2010, for a ture of dark matter halos are closely related to their as- comprehensive review). Dark matter halos are the nat- sembly histories, including substructure abundance (Gao ural sites of formation (White & Rees 1978; Blu- et al. 2004; Wechsler et al. 2006; Giocoli et al. 2010), menthal et al. 1984), and so a detailed understanding of the “splashback” feature in the outer profile (Diemer & the buildup and evolution of halos is a key ingredient of Kravtsov 2014), and ellipsoidal shape (Chen et al. 2020a; any theory of structure growth. Lau et al. 2021). The growth history of a dark matter The basic physical picture of halo evolution is now rel- halo is also tightly connected to the larger-scale environ- atively well understood. At early times, halo assembly ment in which it evolves, a phenomenon known as halo is characterized by a “fast-accretion phase” in which the assembly bias (e.g., Gao et al. 2005; Mao et al. 2018; growth of halo mass is rapid and major mergers are com- Mansfield & Kravtsov 2020). mon; mass accretion rates diminish considerably at later The mass assembly history of dark matter halos is times in a “slow-accretion phase”, during which time ma- widely used in theoretical models of galaxy formation as a fundamental quantity regulating the formation arXiv:2105.05859v2 [astro-ph.CO] 28 Jul 2021 jor mergers are comparatively rare (Bullock et al. 2001; Wechsler et al. 2002; Tasitsiomi et al. 2004). The mass rate of the halo’s resident galaxy, including early formu- density profile of dark matter halos is well-described by lations of traditional semi-analytic modeling (Kauffmann a double power law known as the NFW profile (Navarro et al. 1993; Baugh et al. 1996; Avila-Reese et al. 1998, et al. 1997), an approximation that remains reasonably SAMs, hereafter), practically all contemporary SAMs accurate across most of cosmic time (although see Lud- (e.g., Bower et al. 2006; Somerville et al. 2008; Hen- low et al. 2013, for shortcomings of this approximation). riques et al. 2015; Lagos et al. 2018), simple empirical Although the concentration parameter that defines the models (Watson et al. 2015; Behroozi & Silk 2015), and NFW profile exhibits a well-known dependence upon to- recent empirical approaches (Becker 2015; Moster et al. tal mass (e.g., Diemer & Kravtsov 2015; Child et al. 2018; Behroozi et al. 2019). Strong motivation for treat- 2018), all halos in the fast-accretion phase tend to have ing halo assembly as the backbone of galaxy formation similar values of concentration c ≈ 3 − 4 (Zhao et al. models also comes directly from hydrodynamical simu- 2003a), and as halos transition to the slow-accretion lations, which show that the accretion rate of baryonic mass closely tracks that of dark matter (Wetzel & Nagai ?E-mail:[email protected] 2015). 2 Hearin et al.

The assembly of a halo is fully characterized by its functions historically only reproduce average halo growth merger tree, which describes the mass and accretion time across time, but do not enjoy capability to predict the of the “progenitors” that merged into the main halo over diversity of halo assembly, a shortcoming made especially time, as well as the progenitors of those progenitors, glaring through comparison to EPS-based techniques. etc. The most common theoretical technique for mod- In this paper, we seek to improve upon this situa- eling halo merger trees is the (extended) Press-Schechter tion by developing an empirical approach to modeling formalism (Press & Schechter 1974; Bond et al. 1991; the growth of both individual halos, as well as the as- Bower 1991; Lacey & Cole 1993, EPS, hereafter). The sembly of cosmologically representative populations. In EPS framework yields considerable insight into many as- §3, we will introduce a new parameterization for indi- pects of the structure and buildup of dark matter halos vidual halo growth, validating that our formulation is (e.g., Somerville & Kolatt 1999; Nusser & Sheth 1999; sufficiently flexible to capture the assembly of halos in Cole et al. 2008), and has been shown to yield reason- both the gravity-only and full-physics hydro simulations ably accurate predictions for the average growth of ha- described in §2. We outline our statistical model for the los over time (van den Bosch 2002), the mass function assembly of halo populations in §4, relegating exposi- of the progenitor halos (Parkinson et al. 2008), and the tion on mathematical and computational details to the statistical distribution of halo formation times (Li et al. appendices. We discuss our results and potential appli- 2007; Giocoli et al. 2012). Merger trees predicted by EPS cations in §5, and conclude in §6 with a brief summary furthermore enable suitably implemented SAMs to gen- of our primary results. erate predictions for galaxy evolution without the need 2. SIMULATIONS AND MERGER TREES for high-resolution N-body simulations (e.g., Somerville et al. 2008; Benson 2012). We refer the reader to Zentner We have built our model for the mass assembly history (2007) for a pedagogical overview of EPS, and to Jiang (MAH hereafter) of dark matter halos to approximate re- & van den Bosch(2014) for a comparison of modern im- sults taken from the merger trees of halos identified in plementations of EPS predictions of merger trees. both gravity-only and hydrodynamical simulations. For Empirically formulated fitting functions calibrated our gravity-only simulations, we use a combination of against cosmological simulations offer a simpler route to the Bolshoi Planck simulation (BPL, Klypin et al. 2011) describing the evolutionary history of halo mass. The and MultiDark Planck 2 (MDPL2, Klypin et al. 2016). The BPL simulation was carried out using the ART code one-parameter exponential model developed in Wechsler 3 et al.(2002) is one of the first of such models, and a (Kravtsov et al. 1997) by evolving 2048 dark-matter par- 8 number of improvements have followed this early em- ticles of mass mp = 1.55 × 10 M on a simulation box pirical approach that capture richer phenomena with of 250 Mpc on a side; the MDPL2 simulation was carried greater accuracy. For example, the fitting formula in Ta- out using the L-GADGET-2 code (Springel 2005) with 3 9 sitsiomi et al.(2004) is a two-parameter generalization 3840 dark-matter particles of mass mp = 1.51 × 10 M of the exponential growth model with improved perfor- on a simulation box of 1000 Mpc on a side; both simu- mance for a greater diversity of cluster-mass halos. In lations were run under cosmological parameters closely Fakhouri et al.(2010), the authors present a simple and matching Planck Collaboration et al.(2014). For both 1 accurate fitting formula for hdMhalo(t)/dt|M0i, the aver- BPL and MDPL2, we used publicly available merger age mass accretion rate of halos as a function of cosmic trees that were identified with Rockstar and Consistent- time and present-day mass M0, and they furthermore Trees (Behroozi et al. 2013b,c; Rodr´ıguez-Pueblaet al. find that they can use their model to accurately predict 2016). hMhalo(t)|M0i by numerically integrating their parame- We additionally studied the assembly of halos in the terized mass accretion rate histories. In McBride et al. IllustrisTNG hydrodynamical simulations. IllustrisTNG (2009), the authors introduce a different two-parameter is a suite of hydro simulations that models the joint evo- fitting function for individual halo mass growth, and also lution of dark matter, gas, , and supermassive black include additional modeling ingredients for the depen- holes, and includes treatment of radiative gas cooling, dence of how the fitting parameters depend on present- star formation, galactic winds, and AGN feedback (Wein- day halo mass. berger et al. 2017; Pillepich et al. 2018). We use publicly The program to develop empirical approximations to available data from the largest hydrodynamical simula- assembly is faced with a number of tion of the suite, TNG300-1 (Nelson et al. 2018). The challenges. Each of the efforts summarized above were TNG300-1 simulation was carried out using the moving- conducted at fixed cosmology, and also for gravity-only mesh code Arepo (Springel 2010) 25003 with gas trac- N-body simulations, and the robustness of these findings ers together with the same number of dark matter par- to effects of cosmology and hydrodynamics remains rela- ticles in a simulation box of 205 Mpc on a side under tively coarsely characterized (although see van den Bosch a cosmology very similar to Planck Collaboration et al. et al. 2014, discussed further in §5 below). Efforts to im- (2014). For TNG300-1 the corresponding mass resolution 7 6 prove flexibility beyond the one-parameter exponential is 4.1×10 M and 7.7×10 M for dark matter and gas, model also face a significant difficulty of ensuring phys- respectively. Halos and subhalos in IllustrisTNG were ically consistent behavior across the full range of mass identified with the SUBFIND algorithm (Springel et al. and redshift; for example, as pointed out in Appendix 2001), and the merger trees we use were constructed with H of Behroozi et al.(2013a), predictions of the model SUBLINK (Rodriguez-Gomez et al. 2015). presented in McBride et al.(2009) exhibit unphysical For some of the results in the paper, we calculate cross- behavior at high redshift due to the crossing of average correlation functions between simulated halos and the assembly histories of halos of different present-day mass. Of special concern to the present paper, empirical fitting 1 https://www.peterbehroozi.com/data.html A Differentiable Model of Halo Assembly 3 density field, using a random downsampling of particles (αearly, αlate, τc) = (1.25, 0.05, 0.6[Gyr]); for larger val- from the appropriate snapshot. For BPL we use a ran- ues, we have (αearly, αlate, τc) = (5.0, 0.6, 2.5[Gyr]). The dom downsampling of dark matter particles provided by top row of panels shows the history of cumulative mass, halotools (Hearin et al. 2017). For IllustrisTNG, we Mpeak(t), which is parameterized directly according to use a random downsampling of TNG300-1 particles at Eq.1. The bottom row of panels shows d Mpeak/dt, the z = 0 kindly provided by Benedikt Diemer. history of mass accretion rate. While dMpeak/dt is ana- All results in the main body of the paper pertain to lytically calculable by symbolically differentiating Eq.1, the assembly histories of present-day host halos (i.e., we have computed each curve in the bottom panels of upid=-1 for Rockstar, and the “main halo” for SUBFIND). Figure1 via automatic differentiation (autodiff, see Bay- Comparable results for the case of subhalos can be found din et al. 2015, for a recent review), as our source code is in AppendixD. Throughout the paper, including the implemented based on the JAX autodiff library (Brad- present section, values of mass and distance are quoted bury et al. 2018). assuming h = 1. For example, when writing Mpeak = 12 10 M , we suppress the M /h notation and write the 3.2. Model performance units as M . In the previous section, we illustrated the basic fea- 3. MODELING INDIVIDUAL HALO ASSEMBLY tures of our model for dark matter halo assembly. Here we show the accuracy with which simulated halo his- In this section, we present results for our model of the tories can be approximated by our parametric model. assembly of individual dark matter halos. In §3.1 we de- When fitting the mass growth of an individual dark mat- scribe the functional form of the model and its differen- ter halo with the fitting function in Equation1, we allow tiable implementation, and in 3.2 we show the accuracy § the parameters αearly, αlate, and τc to vary freely, and use with which the model can approximate the formation gradient descent to minimize the logarithmic difference history of simulated halos. between the simulated and predicted values of Mpeak(t), with k held fixed to a constant value of 3.5 for all halos, 3.1. Model formulation and M0 held fixed to Mpeak(t0). We refer the reader to Our model for the growth of individual halos is de- AppendixA for a detailed description of our optimization signed to capture the evolution of cumulative peak halo technique. mass, Mpeak(t), defined as the largest mass the main pro- genitor halo has ever attained up until the time t. Thus 3.2.1. Capturing individual halo growth as a matter of definition, Mpeak(t) is a non-decreasing Figure2 shows a typical example of our model’s abil- function for all t, so that Mpeak(t) will remain constant ity to approximate dark matter halo growth. Both pan- if a halo experiences a period of mass loss. els show the assembly history of the same dark matter We model Mpeak(t) with a power-law function of cos- halo taken from the IllustrisTNG simulation; the top mic time with a rolling index, panel shows the history of the halo’s mass accretion rate, M (t) = M (t/t )α(t), (1) and the bottom panel shows the history of its cumu- peak 0 0 lative peak mass. The blue curve in each panel shows where t0 is the present-day age of the universe, M0 ≡ the assembly history of the halo as taken directly from Mpeak(t0), and α(t) is given by a sigmoid function defined the simulated merger tree, and the orange curve shows as follows: the differentiable approximation. As described in Ap- αlate − αearly pendixA, our differentiable approximation comes from α(t; τc, k, αearly, αlate) ≡ αearly + (2). fitting Mpeak(t) of the simulated halo, and our approxi- 1 + exp(−k(t − τc)) mation for dMpeak/dt is a derived quantity. The parameters αearly and αlate define the asymptotic Using the optimization techniques detailed in Ap- value of the power-law index at early and late times, pendixA, we have identified a set of best-fitting param- respectively, and τc is the transition time between the eters for every halo in the samples described in §2. Fig- early- and late-time indices. The parameter k controls ure3 displays the quality of these fits for both gravity- the speed of the transition between the two regimes; as only N-body simulations and IllustrisTNG. Each panel described in AppendixA, for all results in the paper we shows the logarithmic difference between the simulated hold k fixed to a constant value of 3.5. and best-fitting MAH plotted as a function of cosmic In Figure1, we give a visual illustration of the phys- time; results for samples of halos of different present-day ical interpretation of the three free parameters of our mass are shown in different panels as indicated by the in- model: αearly, αlate, and τc. Each of the six panels shows panel annotation. The black curves in each panel show the assembly history of halos with present-day mass the average residual difference, and the shaded band 12 M0 = 10 M , with variations of the MAH parameters shows the 1σ scatter; residuals for gravity-only simula- color-coded according to the legend. For each of the three tions are shown with the solid black curve and purple MAH parameters, αearly, αlate, and τc, smaller values of shaded region; residuals for IllustrisTNG are shown with the parameter corresponds to a halo that was accreting the dashed black curve and green shaded region. mass more rapidly at early times and more slowly today; Figure3 shows that the rolling power-law model de- thus a halo with a smaller value of a MAH parameter has fined by Eq.1 gives an unbiased approximation to halo an earlier formation time relative to a halo with a larger mass assembly with a typical error of 0.1 dex for t & 1 value of that parameter. For curves labeled “fiducial Gyr; residual errors are slightly larger at higher halo model”, we have (αearly, αlate, τc) = (2.5, 0.3, 1.25[Gyr]); mass due to the increased prominence of major merg- for parameters set to their smaller values, we have ers. In AppendixD, we show that analogous results hold 4 Hearin et al.

Fig. 1.— Physical interpretation of the parameters of our model for individual halo assembly. Each panel shows the assembly 12 history of a dark matter halo with present-day mass M0 = 10 M ; the top row of panels shows the history of cumulative peak mass, Mpeak(t); the bottom row of panels shows the history of mass accretion rate, dMpeak/dt. Our model approximates halo assembly as a α(t) power-law function of time with rolling index, Mpeak(t) ∝ t , as in Eq.1. The left columns show the influence of αearly, the power-law index at early times; the middle columns show the influence of αlate, the power-law index at late times; the right columns show τc, which controls the transition time between the fast- and slow-accretion regimes. Smaller values of each parameter correspond to halos with earlier formation times. for the case of present-day host halos that have experi- the difference correlates with the density field in a way enced a flyby event at some point in their past history that alters the tform-dependence of ξδm(r). For example, (Figure D13), for present-day subhalos (Figure D14), halos with a merger-rich history tend to reside in denser and also for subhalos that have either merged or be- regions relative to halos with a more quiescent assembly, come disrupted prior to z = 0 (Figure D15). In Fig- and the prevalence of mergers may be correlated with ure3, we have restricted attention to t > 1 Gyr for the residual errors of our best-fitting approximation. the sake of ensuring good mass resolution across the To investigate this question, we select a sample of 11.9 mass range of all our samples; this cut in cosmic time host halos in the BPL simulation with 10 M ≤ 12.1 is driven by our focus on the redshifts z . 5 that are M0 ≤ 10 M , divide the sample in half according most relevant to cosmological surveys, and by the lack to the median value of tform for the sample, and use of a publicly-available, higher-resolution simulation with halotools (Hearin et al. 2017) to compute ξδm(r) for homogeneously processed merger trees (see §A for further each subsample. We have repeated this exercise using discussion). the value of tform computed directly from each halo’s sim- ulated merger tree, and compared the clustering to the 3.2.2. Capturing halo assembly bias case when tform is defined instead based on the differ- entiable approximation to each individual halo’s MAH. Even though Figure3 shows that simulated MAHs are We display our results in Figure4; the top panel shows well approximated by the differentiable model, of course t = t , and the bottom panel shows t = t . the agreement is not perfect, and a question that natu- form 4% form 50% We find a similar level of agreement for halos of different rally arises is whether the residual errors are correlated mass in both gravity-only simulations, and Figure E17 with the cosmic density field. As mentioned in 1, at § in AppendixE shows the analogous result for the case of fixed present-day mass, the large-scale density of halos IllustrisTNG. We have also verified that we obtain a sim- exhibits a significant dependence upon assembly history, ilar level of agreement between simulation- and model- a phenomenon referred to as halo assembly bias. This based clustering when dividing our halo samples based effect is commonly quantified in terms of the two-point on quartiles of formation time rather than medians. cross-correlation between halos and dark matter parti- In summary, the results of 3 and AppendicesD-E cles, ξ (r). For example, for a sample of dark matter § δm demonstrate that the 3-parameter fitting function de- halos with the same value of M ≡ M (t ), the quan- 0 peak 0 fined by Eq.1 is sufficiently flexible to approximate tity ξ (r) varies with the formation time of the halo, δm M (t) with an accuracy of better than 0.1 dex across t , defined as the first time the halo attains some spec- peak form nearly all of cosmic time; that this level of accuracy is ro- ified fraction of M . The non-zero residual errors shown 0 bust to variations in halo growth due to hydrodynamics in Fig.3 indicate that each halo’s approximate and ac- and baryonic feedback, and applies equally well to subha- tual values of tform will differ, and it is plausible that A Differentiable Model of Halo Assembly 5

Fig. 2.— Example fit to individual halo growth. The blue curve in each panel shows the assembly history of the same dark matter halo taken directly from the merger trees of the Illus- trisTNG simulation. The top panel shows the history of the halo’s mass accretion rate, and the bottom panel shows the history of its cumulative peak mass. In each panel, the orange curve shows the approximate halo history based on our differentiable model. los and splashback centrals; and that the small residual errors of the approximation exhibit an essentially negli- gible correlation with the large-scale density field.

4. MODELING THE ASSEMBLY OF HALO POPULATIONS In the previous section, we presented a model for the assembly history of individual dark matter halos defined by Eq.1. Our model for Mpeak(t) is characterized by three parameters: τc, αearly, and αlate. In this section, we present a model for P (Mpeak(t)|M0), the statistical distribution of assembly histories for halos of present- day peak mass, M0. Having shown that individual sim- ulated MAHs can be accurately approximated by our parametric fitting function, our approach to modeling Fig. 3.— Residuals of fits to individual halos. Using on a P (Mpeak(t)|M0) is to construct a suitably accurate sta- large collection of fits to the assembly histories of simulated halos, tistical model for P (αearly, αlate, τc|M0), the distribution each panel shows the logarithmic difference between the simulated and approximate values of cumulative peak mass, Mpeak(t), as a of our MAH model parameters as a function of M0. function of time; results for samples of halos of different present- In order to motivate the form of our model for day mass are shown in different panels as indicated by the in-panel P (αearly, αlate, τc|M0), in Figure5 we show two differ- annotation. The black curve in each panel shows the average resid- ent cross sections of this multi-dimensional distribu- ual difference, and the shaded band shows the 1σ scatter; residuals 12 for gravity-only simulations are shown with the solid black curve tion, focusing on BPL halos with M0 = 10 M . The and purple shaded region; residuals for IllustrisTNG are shown top panel shows a histogram of P (τc|M0), which shows with the dashed black curve and green shaded region. The figure a roughly log-normal shape with a modest bimodality shows that the rolling power-law model defined by Eq.1 gives an in which two distinct sub-populations emerge: earlier- unbiased approximation to halo mass assembly with a typical error of 0.1 dex for t & 1 Gyr. forming halos with τc . 2 Gyr, and later-forming ha- los with τc & 2 Gyr. In the bottom panel of Figure5, we show a scatter plot of the two-dimensional distribu- 6 Hearin et al.

Fig. 4.— Assembly bias in our model for individual halo growth. Using a sample of host halos in the BPL simulation with 12 Mpeak(t0) ≈ 10 M , we divide the sample in half according to tform, the median value of halo formation time for the sample, and for each subsample we compute ξδm(r), the cross-correlation be- tween halos and dark matter particles. Red curves show ξδm(r) for early-forming halos, blue curves for late-forming halos; solid curves show results for the case where tform is computed directly Fig. 5.— Bimodal distribution of halo MAH parameters. from each halo’s simulated merger tree; dashed curves show results The figure shows the distribution of best-fitting parameters of our for the case where tform is defined by the differentiable approxima- model for individual halo growth for a sample of host halos in 12 tion to each halo’s assembly history. The top and bottom panels the BPL simulation with Mpeak(t0) ≈ 10 M . The top panel show results for different definitions of tform as indicated in the shows the distribution of τc, which presents a modest bimodality legend. The figure demonstrates that the correlation between halo of early-forming halos (τc < 2 Gyr) and late-forming halos (τc > 2 formation time and the density field is retained when simulated Gyr). The bottom panel shows a scatter plot of U(αearly), and merger trees are approximated with our differentiable model. U(αearly − αlate), where U(s) ≡ ln(exp(s) − 1), with points color- coded according to their value of τc. The two panels together high- light the bimodal structure of the three-dimensional probability tion of P (βe, β`|M0), where we define β` = U(αlate), distribution, P (τc, αearly, αlate|M0). We leverage this bimodality the variable βe = U(αearly − αlate), and the function when building our model for the growth of halo populations. U(s) ≡ ln(exp(s) − 1). As described in AppendixA, the variables βe and β` are the actual quantities that ent bimodality to build a simple two-population model we programmatically vary when seeking best-fitting ap- of halo assembly; we refer the reader to AppendixB for proximations to the growth of individual halos, as this a more expansive discussion of the full three-dimensional enforces physical constraints in the approximation to distribution, P (αearly, αlate, τc|M0). each halo’s assembly. In the bottom panel of Figure5, We model P (αearly, αlate, τc|M0) as a two-population points with τc > 2 Gyr are shown in purple, and points normal distribution of three variables: log10 τc, β` = with τc < 2 Gyr are shown in orange. The color-coding U(αlate), and βe = U(αearly − αlate). For BPL, MDPL2, in the bottom panel brings the bimodal structure of and IllustrisTNG, we find that relative abundance of P (αearly, αlate, τc|M0) into sharper relief; while the two- the two populations is roughly 50%, but exhibits a dimensional marginal distribution P (αearly, αlate|M0) is shallow mass-dependence. Each sub-population is de- rather complex, the same two-dimensional distribution scribed by an independently-defined mean, µi, and co- becomes significantly simpler when transforming to the variance matrix, Σi, each of which also have a shallow, variables βe and β`, and conditioning the distribution monotonic mass-dependence, so that µi = µi(M0) and upon whether τc < 2 Gyr. In the remainder of this sec- Σi = Σi(M0). In calibrating our model parameters for tion, we will outline how we have leveraged this appar- this mass dependence, our principal goal is to faith- A Differentiable Model of Halo Assembly 7

legend, with dashed curves showing the model, and solid curves showing the corresponding quantity in the simu- 13.5 lations; we use BPL for halos with Mpeak ≤ 10 M , and MDPL2 for more massive halos. Several of the basic physical principles of CDM struc- ture formation are on plain display in Figure6. In the top panels, we can see that for halos of all mass, the slope of Mpeak(t) is steeper at early times relative to late times; the flattening of the slope of Mpeak(t) is a mani- festation of the evolution of halos from the fast-accretion regime to the slow-accretion regime. Structure growth in CDM is said to be “hierarchical”, such that the for- mation of smaller halos precedes that of more massive halos. The hierarchical of halo assembly is visible in both panels, as we can see that the slope of the bluer curves reaches an inflection point at earlier times relative to the redder curves. In the bottom panel we can see the influence of on halo mass accretion rates: at low redshift when the cosmic acceleration becomes sig- nificant, the value of dMpeak/dt is declining for halos of all mass, with lower-mass halos becoming subject to this trend at earlier epochs. In Figure7, we show that our model for the assembly of halo populations captures both the average growth of halos, as well as the diversity of halo assembly. In the left column of panels of Fig.7, we plot hMpeak(t)|M0i, and in the right column of panels we plot hdMpeak/dt|M0i; in all panels, solid black curves show the quantity taken from the simulation, gray shaded bands show the 1σ standard deviation of the simulated quantity, and the dashed green curves show the results of our best-fitting model. Each row of panels corresponds to the history of halos with Fig. 6.— Average assembly history in our model for halo different present-day mass, as indicated by the in-panel populations. The top panel shows the average cumulative peak annotation. mass as a function of cosmic time, and the bottom panel shows av- As mentioned above, the plotted data in Figure7 are erage accretion rates. The histories of halos with different present- day mass are color-coded as indicated in the legend. Solid curves the first and second moments of P (Mpeak(t)|M0), which show the average histories of simulated halos; dashed curves show we used directly as target data to calibrate our model the corresponding quantity approximated by our model for the as- parameters. In Figure8, we show comparisons between sembly of halo populations. simulations and our model predictions for the full dis- tribution, P (Mpeak(t)|M0). In the top panel of Fig.8, fully describe P (Mpeak(t)|M0) and P (dMpeak/dt|M0). we focus on a sample of host halos in MDPL2 with 14 Thus when optimizing our model parameters, for our present-day mass of M0 = 10 M , and show the dis- target data we use measurements of the first two tribution of halo masses at several different redshifts as moments of P (Mpeak(t)|M0) and P (dMpeak/dt|M0), indicated in the legend. In the bottom panel of Fig.8, rather than attempting to directly recover the dis- we focus on host halos in BPL with present-day mass of 12 tribution P (αearly, αlate, τc|M0). Briefly, we tabulate M0 = 10 M , and show distributions of halo formation P (Mpeak(t)|M0) and P (dMpeak/dt|M0) using several time, tform, defined as the first time the halo attained narrow bins of halo mass, trimming 3σ outliers before some specified fraction of its present-day mass; different computing mean and variance; we optimize the parame- shaded histograms show different choices for the defini- ters of our model for µi(M0) and Σi(M0) by minimizing tion of tform as indicated in the legend. In both panels, the logarithmic difference between the mean and variance each shaded histogram is accompanied by a dashed black of simulated and predicted values for P (Mpeak(t)|M0) curve showing the predictions of our differentiable model and P (dMpeak/dt|M0). Our differentiable formulation in of halo populations. JAX enables us to optimize our parameters with the Fig.8 does not display the same level of agreement be- gradient-based BFGS algorithm implemented in scipy tween simulations and model as illustrated in Figure7, (Broyden 1970; Fletcher 1970; Goldfarb 1970; Shanno because in Fig.8 we show the full probability distri- 1970). We refer the reader to AppendixC for a detailed butions taken directly from the simulated merger trees description of our application of these techniques. without trimming 3σ outliers. We refer the reader to In Figure6, we compare the average assembly of halos §C.2 for a detailed description of our outlier exclusion al- in gravity-only simulations to our best-fitting model; the gorithm, which has the effect of excluding 2-3% of halos top panel shows the average cumulative peak mass as a at each mass that have recently experienced an unusu- function of cosmic time, and the bottom panel shows av- ally large major merger. Thus the shaded histograms in erage accretion rates. Assembly histories for halos of dif- Fig.8 include contributions from short-term fluctuations ferent present-day mass are color-coded as shown in the associated with recent mergers, whereas we optimized 8 Hearin et al.

Fig. 7.— Diversity of assembly histories in our model for halo populations. In the left column of panels, we plot the history of cumulative mass of simulated halos, Mpeak(t), showing its average value with the solid black curve, and the 1σ standard deviation with the gray band. The right hand column of panels shows the mean and standard deviation of halo mass accretion rates. The history of halos of different present-day mass are shown in different rows of panels as indicated by the in-panel annotation. In all panels, the dashed green curves show our model for the assembly of halo populations. our model for halo populations using a sample that ex- halo mass is a power-law function of cosmic time with α(t) cludes such contributions. This difference can be seen in a rolling index, Mpeak(t) ∝ t , where α(t) is a sigmoid two different manifestations in Fig.8. First, the PDFs function that smoothly transitions halo growth from the shown in the shaded histograms have slightly broader early-time, fast-accretion regime, to the late-time, slow- support relative to our model, reflecting the increased accretion regime (see Eqs.1-2). We note that we have variance associated with the inclusion of outliers. Sec- chosen to model peak halo mass, Mpeak(t), the largest ond, the shaded histograms tend to be shifted towards mass attained by the halo up until the time t, and so slightly earlier redshifts relative to the model; this sec- our model is not intended to capture the instantaneous ond effect is due to our election to model the evolution halo mass, Mhalo(t), which is subject to additional phys- of peak halo mass, Mpeak(t), whose associated formation ical effects whose treatment is beyond the scope of the times are asymmetrically shifted to earlier epochs by ma- present paper. jor mergers. Both of these effects are visible in Fig.8. The results in §3 in the main body of the paper demon- The full distributions for mass halos shown strate that for host halos identified with Rockstar over in the bottom panel display a somewhat better level of a wide range of present-day mass, M0 ≡ Mpeak(z = agreement relative to the full distributions for cluster- 11 0) & 10 M , our model faithfully describes the mass mass halos shown in the top panel; this is sensible since assembly of individual halos over nearly all of cosmic the assembly of more massive halos tends to be more rich time, t & 1 Gyr, with a typical accuracy of ∼ 0.1 dex in mergers. or better. In AppendixD, we show that our model for Mpeak(t) enjoys the same level of accuracy regard- 5. DISCUSSION & FUTURE WORK less of whether the host halo experiences a splashback We have presented a new model for the assembly his- event in its past history, or whether the halo eventu- tory of individual dark matter halos. In our model, ally becomes a subhalo of a more massive host, including A Differentiable Model of Halo Assembly 9

across time, P (Mpeak(t)|M0). Our model for the growth of halo populations is differentiable courtesy of its imple- mentation in the autodiff library JAX, and accurately captures the average mass accretion rate across cosmic time, hdMpeak/dt|M0i, as well as the diversity of accre- tion rates P (dMpeak/dt|M0). As a convenience, our pub- licly available python code includes functionality to gen- erate a Monte Carlo realization of the diversity of halos with the same M0. The accuracy of our approximation may not be sur- prising as previous results have shown that halo mass is predominantly assembled via a combination of smooth accretion and mergers with a very small mass ratio (e.g., Genel et al. 2010). However, our model could naturally be used as the basis of empirical techniques to predict merger rates and/or the subhalo mass function at the time of accretion (also referred to as the unevolved sub- , or USHMF hereafter). For example, in Yang et al.(2011), the authors developed an analyt- ical model that accurately predicts the USHMF; as the basis of their model, they used the fitting function of Zhao et al.(2003b) for the median assembly history of dark matter halos, supplementing this with a simple log- normal distribution to describe random scatter in the main-branch mass as a function of time. The techniques in the present work could be used to improve upon such an approach, as our model provides an accurate descrip- tion of P (Mpeak(t)|M0), as shown in Figure7. Such an extension could also prove fruitful when used in concert with subhalo orbital evolution codes (e.g., Zentner et al. 2005; Jiang et al. 2021). Recent progress in modeling subhalo orbital evolution has demonstrated the impor- Fig. 8.— Example comparisons of halo assembly distri- tance of accounting for the evolution of the host halo po- butions. In the top panel, we focus on halos with present-day tential (Ogiya et al. 2021), and so deploying our model in 14 mass M0 = 10 M , and show the distribution of masses at this context could be leveraged to capture physically re- higher redshift, Mpeak(z), with results at different redshift color- coded as indicated in the legend. In the bottom panel, for halos alistic accretion-rate correlations in the halo-to-halo vari- 12 with M0 = 10 M , we show the distribution of formation times, ance of substructure abundance and orbits across cosmic tform, using several different definitions as indicated in the leg- time (see, e.g., Jiang & van den Bosch 2017). Along end. Dashed black curves in each panel show the corresponding similar lines, it has recently been shown that scatter in predictions based on our model for the assembly of halo popula- tions. Plotted quantities shown in this figure are distinct from the thermal Sunyaev-Zel’dovich effect is largely driven those shown in Fig.7 in that here we show comparisons of the by variance in the assembly histories of cluster-mass ha- full probability distributions of quantities computed directly from los (Green et al. 2020), and so our model may also help simulated merger trees without trimming 3σ outliers. improve the predictive power of models for the mass- observable relation of galaxy clusters. cases where the subhalo is disrupted or merges with its While previous work has demonstrated that the host. In AppendixE (together with Fig.3), we show cosmology-dependence of halo assembly is relatively sim- that our model achieves the same level of accuracy in its ple (e.g., Zhao et al. 2009), all results in the present description of Mpeak(t) for both host halos and subhalos work are limited to a fixed set of cosmological param- in IllustrisTNG, a full-physics hydrodynamical simula- eters that closely match those in Planck Collaboration tion whose halos and merger trees were identified with et al.(2014). In van den Bosch et al.(2014), it was entirely different algorithms from those used in the anal- shown that halo assembly histories have a universal form ysis of the gravity-only simulations we consider. controlled entirely by the cosmology-dependence of cos- We have additionally presented a new model for the mic time, and so it may be relatively straightforward to assembly of halo populations. In typical empirical ap- extend our model for P (Mpeak(t)|M0) to capture the in- proaches to modeling halo assembly, average halo growth fluence of cosmological parameters on halo growth. Such is directly parameterized with a fitting function, and an extension could be directly incorporated into frame- variance in halo assembly is modeled simply as random works that model cosmology-dependence via physically- scatter. Our alternative approach is formulated by us- motivated rescalings (e.g., Angulo & White 2010; Ren- ing our model for individual halo growth as its funda- neby et al. 2018; Aric`oet al. 2020). In principle, our mental basis, and by building a model for the statisti- model could also be used in applications with emulators cal distribution of parameters that regulate individual built upon cosmological suites of simulations (e.g., Heit- halo assembly. As a result, our halo population model mann et al. 2016; DeRose et al. 2019; Nishimichi et al. not only reproduces hMpeak(t)|M0i, but also a cosmo- 2019), although this would require suites with high mass- logically realistic diversity of individual halo trajectories resolution that include merger trees, which remains un- 10 Hearin et al. common (although see Contreras et al. 2020, for recent fitting differentiable model. This result implies that our progress). model can approximate halo growth in such a way that Our restriction to cosmological parameters similar to residual errors in the fit are uncorrelated with the density Planck Collaboration et al.(2014) is just one example field on large scales. This finding is especially interesting of how in the present work, we have not attempted to in light of the relationship between assembly bias and provide a comprehensive study of P (Mpeak(t)|M0). As halo concentration (Wechsler et al. 2006), the latter of another example, even though we have provided sepa- which is closely connected to assembly history (Wech- rate calibrations of our model for halos in gravity-only sler et al. 2002; Chen et al. 2020b), and is known to be simulations and in IllustrisTNG, we have not attempted strongly influenced by mergers (e.g., Wang et al. 2020). a detailed characterization of the differences between the We will investigate this issue systematically in follow-up assembly histories of halos in the two simulations. For work to the present paper. such a purpose, a simulation suite such as CAMELS As discussed in §1, halo assembly history plays a fun- (Villaescusa-Navarro et al. 2020) that spans a range of damental role in contemporary semi-analytic models of baryonic feedback parameters would enable a more rig- galaxy formation (SAMs), since the star formation rate orous study than an uncontrolled comparison between of a main sequence galaxy is commonly assumed to be IllustrisTNG and N-body halos. Furthermore, when cal- proportional to the mass accretion rate of its parent ibrating our models for P (Mpeak(t)|M0), we have focused halo. We have formulated our model for the assembly 11.5 14.5 on the mass range 10 M ≤ Mpeak ≤ 10 M as this of halo populations with numerous applications to SAMs range of halo masses contains thousands of halos that squarely in mind. For example, our model has potential are resolved by over 2000 particles in the N-body sim- to considerably accelerate efforts to explore the portions ulations we use; however, a series of nested-resolution of SAM parameter space that regulate eff , the efficiency boxes would be considerably more effective towards the with which transform accreted mass into stars, goal of extending the mass range and conducting rigor- since running SAMs on main-branch histories alleviates ous resolution tests. As discussed in §A, we have simi- the need to solve galaxy formation ODEs along every larly focused on the range of cosmic time t > 1Gyr to branch of the merger tree. On the one hand, this com- avoid potential resolution issues in our simulations, but putational speedup comes at the cost of neglecting the our work could be extended to redshifts z > 5 with a role of mergers on SAM-predicted galaxy properties. On simulation suite that would permit such resolution tests. the other hand, the capability of our model to gener- Improvements such as these would require a dedicated ate a physically realistic diversity of smooth, individual effort along the lines of what has been done to calibrate halo assembly histories enables a rather comprehensive models of the halo mass function (e.g., Jenkins et al. investigation of the specific role of mergers in shaping the 2001; Tinker et al. 2008; McClintock et al. 2019; Boc- properties of the galaxy population. We intend to pursue quet et al. 2020); we thus consider these improvements these directions in future work based on the Galacticus beyond our present scope, as here we merely demonstrate semi-analytic model (Benson 2012); Galacticus already that a modeling approach such as ours has capability to includes an independent implementation of our model in deliver a precision tool. its publicly available source code2, which may be use- When fitting the growth history of large samples ful in its own right for researchers more accustomed to of simulated halos, we find that the distribution of programming in Fortran rather than Python. best-fitting parameters exhibits a clear bimodality, with As the principal aim of our future work, we are us- earlier-forming and later-forming halos occupying dis- ing our model for halo assembly as the foundation of a tinct regions of the three-dimensional parameter space new approach to forward modeling cosmological struc- of αearly, αlate, and τc. Bimodal halo growth has been ture formation, Surrogate Modeling the Baryonic Uni- reported previously in Shi et al.(2018) in terms of the verse (SBU). The basis of SBU is a minimally flexible distribution of halo formation times, although this pre- parameterized family of solutions to the differential equa- vious finding applied only to infall halos that eventually tions of galaxy formation; we make predictions for galaxy became subhalos of some more massive host. The bi- populations by building a statistical model for a cosmo- modality we find pertains to host halos at z = 0, applies logically realistic diversity of individual galaxy histories to halos of all mass, persists even when excluding halos (Alarcon et al. 2021). The approach taken in the present that have previously experienced a flyby event, and is work to modeling halo populations essentially provides a present with comparable strength in both gravity-only template for this framework. simulations and IllustrisTNG. We have leveraged this This forward modeling effort is already well under- bimodal structure in building our analytical model for way. In Chaves-Montero & Hearin(2020), we studied P (Mpeak(t)|M0), as capturing this feature considerably the influence of star formation history (SFH) on the simplifies the task of achieving the level of agreement broadband photometry of galaxies and found that, de- with simulations shown in Fig.7, particularly for the spite the complexity of the processes that impact optical case of the second moment. color, physically-motivated variations in SFH correspond The two-point clustering of dark matter halos of the to changes in a unique direction in color-color space. In same mass has well-known dependence upon halo forma- a follow-up paper (Chaves-Montero & Hearin 2021), we tion time, tform, a phenomenon known as halo assembly studied how the presence of unresolved SFH fluctuations bias. In Figure4, we showed that the tform-dependence impact model predictions for broad-band photometry, of halo clustering remains the same whether one uses finding that the statistical distributions of the broad- the actual formation time taken directly from simulated band colors of a galaxy population are quite insensitive merger trees, or the value of tform taken from our best- 2 https://github.com/galacticusorg/galacticus/wiki A Differentiable Model of Halo Assembly 11 to short-term variability. These results taken together 2. We find that the connection between halo assembly indicate that the influence of halo assembly upon galaxy and the large-scale density field, known as halo as- photometry is likely to be simple, and that the absence of sembly bias, is entirely captured by our model, and an explicit treatment of halo mergers in our model may that residual errors of our differentiable approxima- not pose a problem for the ability of SBU models to make tion to halo growth exhibit a negligible correlation accurate predictions for the color distributions observed with the density field on large scales. by imaging surveys such as the Dark Energy Survey3 (DES, Dark Energy Survey Collaboration 2016) and the 3. We have introduced a new model for the growth of Legacy Survey of Space and Time4 (LSST, Ivezi´cet al. halo populations, and shown that our model faith- 2019; LSST Science Collaboration et al. 2009). Further- fully reproduces the evolution of average halo mass, more, the results shown in Figure4 indicate that SBU hMpeak(t)|M0i, and average mass accretion rate, predictions for color-dependent clustering are also likely hdMpeak/dt|M0i, for cosmic times t & 1 Gyr. Our to be unbiased by the smooth nature of our model for model additionally captures the diversity of halo halo assembly. mass assembly, P (Mpeak(t)|M0), as well as the di- We expect that the differentiable formulation of our versity of accretion rates, P (dMpeak/dt|M0). halo assembly model based on the JAX autodiff library will play a critical role in this forward-modeling program. 4. Our python code, diffmah, is publicly available Modern analyses of large-scale structure rely upon a con- and can be installed with pip or conda. The repos- siderable number of “nuisance parameters” in order to itory for our source code is on GitHub, https:// ensure that cosmological inference is robust to systematic github.com/ArgonneCPAC/diffmah, and includes uncertainty. Conventional Bayesian techniques such as Jupyter notebooks providing demonstrations of MCMC scale poorly with the dimension of the model pa- typical use cases. A parallelized script in the rameter space, and since the statistical precision of cos- diffmah repository can be used to fit the assembly mological surveys will dramatically improve in the 2020s, histories of simulated halos with our model for in- the parameter space of models used to analyze large- dividual halo growth. The diffmah code provides scale structure measurements will only continue to in- a differentiable description of P (Mpeak(t)|M0) and crease. The capability to make differentiable predictions P (dMpeak/dt|M0) for both gravity-only simula- addresses this growing problem, since gradient-based op- tions and IllustrisTNG, and also includes a con- timization techniques such as Adam are extremely effi- venience function that can be used to generate cient even in very high dimensions (Kingma & Ba 2014), Monte Carlo realizations of cosmologically realis- as are gradient-based inference methods such as Hamil- tic populations of halo assembly histories. Pre- tonian Monte Carlo (Hoffman & Gelman 2014). In the computed fits for hundreds of thousands of ha- present work, we have shown that not only is our model los in the BPL, MDPL2, and IllustrisTNG sim- for individual halo growth differentiable, but using the ulations are available at https://portal.nersc. weighted sampling techniques described in AppendixC, gov/project/hacc/aphearin/diffmah_data/. so are one-point function predictions based on our model for the assembly of halo populations. In a closely re- ACKNOWLEDGEMENTS lated paper to the present work (Hearin et al. 2021), we We thank Raul Angulo, Andrew Benson, Joe DeRose, will introduce a new theoretical framework for making Alexie Leauthaud, and Daisuke Nagai for useful discus- differentiable predictions for two-point functions such as sions. Thanks also to Michael Buehlmann and Aurora galaxy clustering and lensing, using abundance matching Cossairt for feedback on an early release of our source (Kravtsov et al. 2004) as a toy model for the galaxy-halo code, and Benedikt Diemer for kindly providing down- connection. The halo assembly model presented here will sampled particles for IllustrisTNG. APH thanks Lauren provide a core ingredient to our program to make dif- Mastro for keeping the WWOZ Sunday Gospel Show go- ferentiable, multi-wavelength predictions for large-scale ing throughout the pandemic. structure that are physically self-consistent across cos- We thank the developers of NumPy (Van Der Walt mic time. et al. 2011), SciPy (Jones et al. 2001-2016), Jupyter (Ragan-Kelley et al. 2014), IPython (P´erez & Granger 6. SUMMARY 2007), scikit-learn (Pedregosa et al. 2011), JAX (Brad- We conclude by summarizing our primary results: bury et al. 2018), and Matplotlib (Hunter 2007) for their extremely useful free software. While writing this pa- 1. We have introduced a new fitting function for per we made extensive use of the Astrophysics Data Mpeak(t), the evolution of cumulative peak mass Service (ADS) and arXiv preprint repository. The au- of individual dark matter halos; our model approx- thors gratefully acknowledge the Gauss Centre for Super- imates halo assembly as a power-law function of computing e.V. (www.gauss-centre.eu) and the Partner- α(t) cosmic time with rolling index, Mpeak(t) ∝ t ship for Advanced Supercomputing in Europe (PRACE, (see Eq.1). Using both gravity-only simulations www.prace-ri.eu) for funding the MultiDark simulation and IllustrisTNG, we have demonstrated that our project by providing computing time on the GCS Super- model can approximate halo growth with an accu- computer SuperMUC at Leibniz Supercomputing Cen- racy of 0.1 dex over the range t & 1 Gyr for halos tre (LRZ, www.lrz.de). The Bolshoi simulations have 11 of present-day mass M0 & 10 M . been performed within the Bolshoi project of the Uni- versity of California High-Performance AstroComputing 3 https://www.darkenergysurvey.org/ Center (UC-HiPACC) and were run at the NASA Ames 4 http://www.lsst.org/ Research Center. 12 Hearin et al.

Work done at Argonne National Laboratory was sup- the Laboratory Computing Resource Center at Argonne ported under the DOE contract DE-AC02-06CH11357. National Laboratory. We gratefully acknowledge use of the Bebop cluster in

REFERENCES Alarcon, A., Hearin, A., Becker, M., & Chaves-Montero, J. 2021, Genel, S., Bouch´e,N., Naab, T., Sternberg, A., & Genzel, R. in prep 2010, ApJ, 719, 229, doi: 10.1088/0004-637X/719/1/229 Angulo, R. E., & White, S. D. M. 2010, MNRAS, 405, 143, Giocoli, C., Tormen, G., & Sheth, R. K. 2012, MNRAS, 422, 185, doi: 10.1111/j.1365-2966.2010.16459.x doi: 10.1111/j.1365-2966.2012.20594.x Aric`o,G., Angulo, R. E., Hern´andez-Monteagudo, C., et al. 2020, Giocoli, C., Tormen, G., Sheth, R. K., & van den Bosch, F. C. MNRAS, 495, 4800, doi: 10.1093/mnras/staa1478 2010, MNRAS, 404, 502, Avila-Reese, V., Firmani, C., & Hern´andez,X. 1998, ApJ, 505, doi: 10.1111/j.1365-2966.2010.16311.x 37, doi: 10.1086/306136 Goldfarb, D. 1970, Math. Comp., 23, Baugh, C. M., Cole, S., & Frenk, C. S. 1996, MNRAS, 283, 1361, doi: 10.1090/S0025-5718-1970-0258249-6 doi: 10.1093/mnras/283.4.1361 Green, S. B., Aung, H., Nagai, D., & van den Bosch, F. C. 2020, Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. MNRAS, 496, 2743, doi: 10.1093/mnras/staa1712 2015, JMLR, 18, 1 Hearin, A., Ramachandra, N., Becker, M., & DeRose, J. 2021, in Becker, M. R. 2015, arXiv:1507.03605, arXiv:1507.03605. prep https://arxiv.org/abs/1507.03605 Hearin, A. P., Campbell, D., Tollerud, E., et al. 2017, AJ, 154, Behroozi, P., Wechsler, R. H., Hearin, A. P., & Conroy, C. 2019, 190, doi: 10.3847/1538-3881/aa859f MNRAS, 488, 3143, doi: 10.1093/mnras/stz1182 Heitmann, K., Bingham, D., Lawrence, E., et al. 2016, ApJ, 820, Behroozi, P. S., & Silk, J. 2015, ApJ, 799, 32, 108, doi: 10.3847/0004-637X/820/2/108 doi: 10.1088/0004-637X/799/1/32 Henriques, B. M. B., White, S. D. M., Thomas, P. A., et al. 2015, Behroozi, P. S., Wechsler, R. H., & Conroy, C. 2013a, ApJ, 770, MNRAS, 451, 2663, doi: 10.1093/mnras/stv705 57, doi: 10.1088/0004-637X/770/1/57 Higham, N. 2009, Wiley Interdisciplinary Reviews: Behroozi, P. S., Wechsler, R. H., & Wu, H.-Y. 2013b, ApJ, 762, Computational Statistics, 1, 251, doi: 10.1002/wics.18 109, doi: 10.1088/0004-637X/762/2/109 Hoffman, M., & Gelman, A. 2014, Journal of Machine Learning Behroozi, P. S., Wechsler, R. H., Wu, H.-Y., et al. 2013c, ApJ, Research, 15, 1593 763, 18, doi: 10.1088/0004-637X/763/1/18 Hunter, J. D. 2007, Computing In Science & Engineering, 9, 90, Benson, A. J. 2012, New A, 17, 175, doi: 10.1109/MCSE.2007.55 doi: 10.1016/j.newast.2011.07.004 Ivezi´c, Z.,ˇ Kahn, S. M., Tyson, J. A., et al. 2019, ApJ, 873, 111, Blumenthal, G. R., Faber, S. M., Primack, J. R., & Rees, M. J. doi: 10.3847/1538-4357/ab042c 1984, Nature, 311, 517, doi: 10.1038/311517a0 Jenkins, A., Frenk, C. S., White, S. D. M., et al. 2001, MNRAS, Bocquet, S., Heitmann, K., Habib, S., et al. 2020, ApJ, 901, 5, 321, 372, doi: 10.1046/j.1365-8711.2001.04029.x doi: 10.3847/1538-4357/abac5c Jiang, F., Dekel, A., Freundlich, J., et al. 2021, MNRAS, 502, Bond, J. R., Cole, S., Efstathiou, G., & Kaiser, N. 1991, ApJ, 621, doi: 10.1093/mnras/staa4034 379, 440, doi: 10.1086/170520 Jiang, F., & van den Bosch, F. C. 2014, MNRAS, 440, 193, Bower, R. G. 1991, MNRAS, 248, 332, doi: 10.1093/mnras/stu280 doi: 10.1093/mnras/248.2.332 —. 2017, MNRAS, 472, 657, doi: 10.1093/mnras/stx1979 Bower, R. G., Benson, A. J., Malbon, R., et al. 2006, MNRAS, Jones, E., Oliphant, T., Peterson, P., et al. 2001-2016, 370, 645, doi: 10.1111/j.1365-2966.2006.10519.x http://www.scipy.org Bradbury, J., Frostig, R., Hawkins, P., et al. 2018 Kauffmann, G., White, S. D. M., & Guiderdoni, B. 1993, Broyden, C. G. 1970, IMA Journal of Applied Mathematics, 6, MNRAS, 264, 201, doi: 10.1093/mnras/264.1.201 76, doi: 10.1093/imamat/6.1.76 Kingma, D. P., & Ba, J. 2014, arXiv:1412.6980, arXiv:1412.6980. Bullock, J. S., Kolatt, T. S., Sigad, Y., et al. 2001, MNRAS, 321, https://arxiv.org/abs/1412.6980 559, doi: 10.1046/j.1365-8711.2001.04068.x Klypin, A., Yepes, G., Gottl¨ober, S., Prada, F., & Heß, S. 2016, Chaves-Montero, J., & Hearin, A. 2020, MNRAS, 495, 2088, MNRAS, 457, 4340, doi: 10.1093/mnras/stw248 doi: 10.1093/mnras/staa1230 Klypin, A. A., Trujillo-Gomez, S., & Primack, J. 2011, ApJ, 740, —. 2021, MNRAS, doi: 10.1093/mnras/stab1831 102, doi: 10.1088/0004-637X/740/2/102 Chen, Y., Mo, H. J., Li, C., et al. 2020a, ApJ, 899, 81, Kravtsov, A. V., Berlind, A. A., Wechsler, R. H., et al. 2004, doi: 10.3847/1538-4357/aba597 ApJ, 609, 35, doi: 10.1086/420959 —. 2020b, ApJ, 899, 81, doi: 10.3847/1538-4357/aba597 Kravtsov, A. V., Klypin, A. A., & Khokhlov, A. M. 1997, ApJS, Child, H. L., Habib, S., Heitmann, K., et al. 2018, ApJ, 859, 55, 111, 73, doi: 10.1086/313015 doi: 10.3847/1538-4357/aabf95 Lacey, C., & Cole, S. 1993, MNRAS, 262, 627, Cole, S., Helly, J., Frenk, C. S., & Parkinson, H. 2008, MNRAS, doi: 10.1093/mnras/262.3.627 383, 546, doi: 10.1111/j.1365-2966.2007.12516.x Lagos, C. d. P., Tobar, R. J., Robotham, A. S. G., et al. 2018, Contreras, S., Angulo, R. E., Zennaro, M., Aric`o,G., & MNRAS, 481, 3573, doi: 10.1093/mnras/sty2440 Pellejero-Iba˜nez,M. 2020, MNRAS, 499, 4905, Lau, E. T., Hearin, A. P., Nagai, D., & Cappelluti, N. 2021, doi: 10.1093/mnras/staa3117 MNRAS, 500, 1029, doi: 10.1093/mnras/staa3313 Dark Energy Survey Collaboration. 2016, MNRAS, 460, 1270, Li, Y., Mo, H. J., van den Bosch, F. C., & Lin, W. P. 2007, doi: 10.1093/mnras/stw641 MNRAS, 379, 689, doi: 10.1111/j.1365-2966.2007.11942.x DeRose, J., Wechsler, R. H., Tinker, J. L., et al. 2019, ApJ, 875, LSST Science Collaboration, Abell, P. A., Allison, J., et al. 2009, 69, doi: 10.3847/1538-4357/ab1085 arXiv:0912.0201, arXiv:0912.0201 Diemer, B., & Kravtsov, A. V. 2014, ApJ, 789, 1, Ludlow, A. D., Navarro, J. F., Boylan-Kolchin, M., et al. 2013, doi: 10.1088/0004-637X/789/1/1 MNRAS, 432, 1103, doi: 10.1093/mnras/stt526 —. 2015, ApJ, 799, 108, doi: 10.1088/0004-637X/799/1/108 Mansfield, P., & Kravtsov, A. V. 2020, MNRAS, 493, 4763, Fakhouri, O., Ma, C.-P., & Boylan-Kolchin, M. 2010, MNRAS, doi: 10.1093/mnras/staa430 406, 2267, doi: 10.1111/j.1365-2966.2010.16859.x Mao, Y.-Y., Zentner, A. R., & Wechsler, R. H. 2018, MNRAS, Fletcher, R. 1970, The Computer Journal, 13, 317, 474, 5143, doi: 10.1093/mnras/stx3111 doi: 10.1093/comjnl/13.3.317 McBride, J., Fakhouri, O., & Ma, C.-P. 2009, MNRAS, 398, 1858, Gao, L., Springel, V., & White, S. D. M. 2005, MNRAS, 363, doi: 10.1111/j.1365-2966.2009.15329.x L66, doi: 10.1111/j.1745-3933.2005.00084.x McClintock, T., Rozo, E., Becker, M. R., et al. 2019, ApJ, 872, Gao, L., White, S. D. M., Jenkins, A., Stoehr, F., & Springel, V. 53, doi: 10.3847/1538-4357/aaf568 2004, MNRAS, 355, 819, doi: 10.1111/j.1365-2966.2004.08360.x A Differentiable Model of Halo Assembly 13

Mo, H., van den Bosch, F. C., & White, S. 2010, Galaxy Springel, V., White, S. D. M., Tormen, G., & Kauffmann, G. Formation and Evolution (United Kingdom: Cambridge 2001, MNRAS, 328, 726, University Press) doi: 10.1046/j.1365-8711.2001.04912.x Moster, B. P., Naab, T., & White, S. D. M. 2018, MNRAS, 477, Tasitsiomi, A., Kravtsov, A. V., Gottl¨ober, S., & Klypin, A. A. 1822, doi: 10.1093/mnras/sty655 2004, ApJ, 607, 125, doi: 10.1086/383219 Navarro, J. F., Frenk, C. S., & White, S. D. M. 1997, ApJ, 490, Tinker, J., Kravtsov, A. V., Klypin, A., et al. 2008, ApJ, 688, 493, doi: 10.1086/304888 709, doi: 10.1086/591439 Nelson, D., Pillepich, A., Springel, V., et al. 2018, MNRAS, 475, van den Bosch, F. C. 2002, MNRAS, 331, 98, 624, doi: 10.1093/mnras/stx3040 doi: 10.1046/j.1365-8711.2002.05171.x Nishimichi, T., Takada, M., Takahashi, R., et al. 2019, ApJ, 884, van den Bosch, F. C., Jiang, F., Hearin, A., et al. 2014, MNRAS, 29, doi: 10.3847/1538-4357/ab3719 445, 1713, doi: 10.1093/mnras/stu1872 Nusser, A., & Sheth, R. K. 1999, MNRAS, 303, 685, van den Bosch, F. C., & Ogiya, G. 2018, MNRAS, 475, 4066, doi: 10.1046/j.1365-8711.1999.02197.x doi: 10.1093/mnras/sty084 Ogiya, G., Taylor, J., & Hudson, M. 2021, MNRAS, 503, 1233, van den Bosch, F. C., Ogiya, G., Hahn, O., & Burkert, A. 2018, doi: 10.1093/mnras/stab361 MNRAS, 474, 3043, doi: 10.1093/mnras/stx2956 Ogiya, G., van den Bosch, F. C., Hahn, O., et al. 2019, MNRAS, Van Der Walt, S., Colbert, S. C., & Varoquaux, G. 2011, 485, 189, doi: 10.1093/mnras/stz375 ArXiv:1102.1523 Parkinson, H., Cole, S., & Helly, J. 2008, MNRAS, 383, 557, Villaescusa-Navarro, F., Angl´es-Alc´azar, D., Genel, S., et al. 2020, doi: 10.1111/j.1365-2966.2007.12517.x arXiv:2010.00619, arXiv:2010.00619. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal https://arxiv.org/abs/2010.00619 of Machine Learning Research, 12, 2825 Wang, K., Mao, Y.-Y., Zentner, A. R., et al. 2020, MNRAS, 498, P´erez,F., & Granger, B. E. 2007, Computing in Science and 4450, doi: 10.1093/mnras/staa2733 Engineering, 9, 21, doi: 10.1109/MCSE.2007.53 Watson, D. F., Hearin, A. P., Berlind, A. A., et al. 2015, Pillepich, A., Springel, V., Nelson, D., et al. 2018, MNRAS, 473, MNRAS, 446, 651, doi: 10.1093/mnras/stu2065 4077, doi: 10.1093/mnras/stx2656 Wechsler, R. H., Bullock, J. S., Primack, J. R., Kravtsov, A. V., Planck Collaboration, Ade, P. A. R., Aghanim, N., et al. 2014, & Dekel, A. 2002, ApJ, 568, 52, doi: 10.1086/338765 A&A, 571, A16, doi: 10.1051/0004-6361/201321591 Wechsler, R. H., Zentner, A. R., Bullock, J. S., Kravtsov, A. V., Press, W. H., & Schechter, P. 1974, ApJ, 187, 425, & Allgood, B. 2006, ApJ, 652, 71, doi: 10.1086/507120 doi: 10.1086/152650 Weinberger, R., Springel, V., Hernquist, L., et al. 2017, MNRAS, Ragan-Kelley, M., Perez, F., Granger, B., et al. 2014, in 465, 3291, doi: 10.1093/mnras/stw2944 American Geophysical Union Fall Meeting Abstracts, Vol. D7 Wetzel, A. R., & Nagai, D. 2015, ApJ, 808, 40, Renneby, M., Hilbert, S., & Angulo, R. E. 2018, MNRAS, 479, doi: 10.1088/0004-637X/808/1/40 1100, doi: 10.1093/mnras/sty1332 White, S. D. M., & Rees, M. J. 1978, MNRAS, 183, 341, Rodriguez-Gomez, V., Genel, S., Vogelsberger, M., et al. 2015, doi: 10.1093/mnras/183.3.341 MNRAS, 449, 49, doi: 10.1093/mnras/stv264 Yang, X., Mo, H. J., Zhang, Y., & van den Bosch, F. C. 2011, Rodr´ıguez-Puebla,A., Behroozi, P., Primack, J., et al. 2016, ApJ, 741, 13, doi: 10.1088/0004-637X/741/1/13 MNRAS, 462, 893, doi: 10.1093/mnras/stw1705 Zentner, A. R. 2007, International Journal of Modern Physics D, Romberg, W. 1955, Norske Vid. Selsk. Forh., Trondheim, 28, 30 16, 763, doi: 10.1142/S0218271807010511 Shanno, D. 1970, Math. Comp., 647, Zentner, A. R., Berlind, A. A., Bullock, J. S., Kravtsov, A. V., & doi: 10.1090/S0025-5718-1970-0274029-X Wechsler, R. H. 2005, ApJ, 624, 505, doi: 10.1086/428898 Shi, J., Wang, H., Mo, H. J., et al. 2018, ApJ, 857, 127, Zhao, D. H., Jing, Y. P., Mo, H. J., & B¨orner,G. 2009, ApJ, 707, doi: 10.3847/1538-4357/aab775 354, doi: 10.1088/0004-637X/707/1/354 Somerville, R. S., Hopkins, P. F., Cox, T. J., Robertson, B. E., & Zhao, D. H., Mo, H. J., Jing, Y. P., & B¨orner, G. 2003a, MNRAS, Hernquist, L. 2008, MNRAS, 391, 481, 339, 12, doi: 10.1046/j.1365-8711.2003.06135.x doi: 10.1111/j.1365-2966.2008.13805.x —. 2003b, MNRAS, 339, 12, Somerville, R. S., & Kolatt, T. S. 1999, MNRAS, 305, 1, doi: 10.1046/j.1365-8711.2003.06135.x doi: 10.1046/j.1365-8711.1999.02154.x Springel, V. 2005, MNRAS, 364, 1105, doi: 10.1111/j.1365-2966.2005.09655.x —. 2010, MNRAS, 401, 791, doi: 10.1111/j.1365-2966.2009.15715.x we ran the optimization algorithm described below while allowing k to be a free parameter, and then observed no APPENDIX appreciable changes in the quality of the fits when hold- A. FITTING THE ASSEMBLY HISTORY OF INDIVIDUAL ing k fixed to any value in its typical best-fitting range, HALOS 2 . k . 5. Thus our model for individual halo growth In this appendix, we give a detailed description of has a total of 3 free parameters: αearly, αlate, and τc. our parametric model for the mass assembly of indi- When evaluating the sigmoid behavior of the power- law index, we implement this scaling relation in logarith- vidual halos. As outlined in §3, our model for individ- ual halo growth is based on Eq.1, which describes a mic time, x ≡ log10 t, rather than linear time. Thus in power-law scaling between peak halo mass and cosmic practice, we calculate the power-law index α(t) via: α(t) time, Mpeak = M0(t/t0) , where the power-law index x αlate − αearly α(t) smoothly rolls between asymptotic values αearly and α(t) = α(10 ) = αearly + . (A1) αlate according to the sigmoid function in Eq.2. Thus 1 + exp(−k(x − x0)) there are a total of six numbers that fully characterize our model for individual halo growth: the normalization, In order to identify an optimal set of parameters for M0, the present-day cosmic time, t0, and the four sigmoid each halo, we searched our three-dimensional model pa- parameters, αearly, αlate, k, and τc. For all halos in each rameter space, θ, for the combination of αearly, αlate, τc simulation, we hold t0 fixed to the present-day age of that minimizes the quantity LMSE, defined as the universe in the simulation, and M0 fixed to the value Mpeak(t0) in the simulation. All results in the paper also 1 X 2 L ≡ (w(t ; θ) − v(t )) , (A2) hold k fixed to 3.5; to determine this particular value, MSE N i i i 14 Hearin et al.

§5, the most reliable way to address this and related is- sues is through a dedicated resolution-requirement study that uses homogeneously-processed, nested simulation boxes spanning a wide range of resolution. Since such a simulation suite is not currently publicly available, and since our own science applications are focused on the redshift range z . 5 that is most relevant to present-day and near-future cosmological surveys, we have opted to relegate further study of this issue to future work. To minimize LMSE for each halo, we use the JAX implementation of the Adam algorithm (Kingma & Ba 2014), which is a gradient descent technique with an adaptive learning rate. Our use of Adam requires cal- culating the gradients ∂LMSE/∂θ, which is facilitated by our JAX implementation, allowing us to efficiently compute these gradients to machine precision without Fig. A9.— Gradients of parameters in the individual halo growth model. The figure shows logarithmic derivatives reliance upon finite differencing methods. We show ex- of Mpeak(t) with respect to the three parameters of our model for ample gradients of our model predictions for Mpeak(t) in individual halo growth. This figure is similar to Fig.1, only here we Figure A9. To tune the performance of the optimization show the behavior of the autodiff-computed gradients that we use beyond the default settings recommended in Kingma & to determine the best-fitting parameters describing the assembly of each simulated halo. Ba(2014), we use 2 successive burn-in cycles with a rela- tively large step-size parameter of 0.25 for ∼ 50 updates, where w and v are the base-10 logarithm of the predicted followed by a final cycle for an additional ∼ 200 updates and simulated values of Mpeak evaluated at a set of N using a step-size parameter of 0.01. control points, ti. For the control points of each halo, we use the collection of snapshots in the halo’s merger tree A.1. Imposing physical constraints on individual halo after a time tmin, defined as growth When minimizing L , the parameters we actually tmin ≡ max(tcut, tthresh, tδm), (A3) MSE vary in the gradient descent are βe, β`, and x0, where the where we set tcut = 1 Gyr, and the quantity tthresh is the relationships between these parameters and the quanti- most-recent simulated snapshot where the halo mass falls ties that directly appear in Eq.1, αearly, αlate, and τc, below a simulation-dependent threshold mass, Mthresh; are defined as follows: 11.25 for the MDPL2 simulation, we use Mthresh = 10 M , and for the Bolshoi-Planck and TNG simulations we use x0 = log10 τc 10 Mthresh = 10 M . αlate = S(β`) (A4) In Eq. A3, the quantity tδm refers to the most-recent αearly = αlate + S(βe) snapshot where the halo mass falls below some fraction S(z) ≡ ln(1 + exp(z)) fδM of its present-day mass; for all simulations, we set 2.5 fδM = 10 . In each of the three simulations we studied, The function S(z) is the softplus function, which is for a surprisingly large number of halos, we find that at strictly positive across the real line, exhibits approxi- early times, t . 3Gyr, the behavior of Mpeak(t) suddenly mately linear behavior for s & 1, and asymptotically drops like a stone by an order of magnitude or more from approaches zero as s → −∞. For any positively-valued one snapshot to the next, even for halos that should nom- z, the inverse of the softplus function is well-defined and inally be resolved by hundreds of particles. We found computed as S−1(z) ≡ ln(exp(z) − 1), which we have through experimentation that the inclusion of the term written as U(z) in the main body of the paper and plot- tδm in Eq. A3 is reasonably effective at limiting the influ- ted on the axes of Figure5, as well as on the axes of the ence of this phenomenon on the best-fitting parameters figures in AppendixC. Figure A10 gives a visual illus- describing the halo MAH. We suspect that this behav- tration of the behavior of the softplus function we use to ior is a reflection of the difficulty of halo-finding when enforce these constraints for the case of αlate. the high-redshift density field resembles a sea of shallow Through our use of these variable transformations, we peaks, however, the root cause of this phenomenon is ensure that the best-fitting parameters determined by unclear, and other origins of this phenomenon are cer- our application of gradient descent will always respect tainly plausible. For example, recent work on structure 0 < αlate < αearly. Mathematically, this guarantees that formation has uncovered surprisingly stringent resolution in the best-fitting approximation to each individual halo, demands of N-body simulations for science applications for all t > 0 the quantity Mpeak(t) will increase monoton- that require robust tracking of the evolution of halo sub- ically, and dMpeak/dt will strictly slow down as Mpeak(t) structure (van den Bosch et al. 2018; van den Bosch & eventually attains the value M0 at the time t0. These Ogiya 2018; Ogiya et al. 2019). It could be the case that mathematical constraints embody the physical expecta- analogous results apply to the simulation requirements tion that dark matter halo growth begins with a fast- associated with reliable tracking of halo mass accretion accretion phase at early times, and transitions to a slow- history in the first Gyr of cosmic time; alternatively, halo accretion phase at late times. One can imagine employ- assembly at high redshift may simply require an even ing a more flexible fitting function such as one based more flexible fitting function than the smoothly rolling directly on a machine-learning algorithm that would not power-law model adopted here. As discussed further in hard-wire this expectation into the approximation; such A Differentiable Model of Halo Assembly 15

easily identified in terms of their clustering within the distribution of best-fitting parameters, the distribution of the actual values of the formation time of the halos, tform, shows heavy overlap between the two groupings. There is also no strong dependence of the NFW concen- tration, c, on sub-population membership, even though c is strongly correlated with tform. Moreover, when split- ting a sample of halos of the same M0 according to sub-population membership, the two-point clustering of two samples hardly differs, even for halo masses where the actual tform-dependence of ξmδ(r) is strong. Thus even though this bimodality considerably simplified our task of modeling the distribution of best-fitting param- eters (as described in detail in AppendixB), the sub- population membership of a halo may not have special physical significance. Fig. A10.— Illustration of physical constraints on αlate. The figure shows the behavior of the softplus function that controls B. MODELING THE ASSEMBLY OF HALO the relationship between αlate and β`, the actual variable that we vary in the gradient descent algorithm used to fit the assembly POPULATIONS history of each individual halo. We use a similar technique to In this appendix, we describe our model for the diver- ensure 0 < αlate < αearly, and a simple logarithmic transformation to enforce τc > 0. These transformations guarantee that the growth sity of assembly histories of halo populations. The goal of history of each model halo will always increase monotonically and our halo population model is to capture P (Mpeak(t)|M0) transition from a fast-accretion phase to a slow-accretion phase, and P (dM /dt|M ). Our approach to this problem is even for outlier halos with unusual merger trees. See Eq. A4 and peak 0 associated discussion for details. to use our model for individual halo growth as a basis, and to build a model of the statistical connection between an approach has potential to capture short-term fluctu- present-day halo mass and the distribution of parameters ations that our model does not. We note, however, that describing individual halo MAHs, P (αearly, αlate, τc|M0). even when these constraints are imposed on the fits as As outlined in §4, we model the distribution described above, dark matter halo growth can be approx- P (αearly, αlate, τc|M0) as a two-population normal dis- imated with a typical accuracy of 0.1dex or better for all tribution of 3 parameters. We model this distribution t & 1 Gyr. in terms of the same parameter transformations used In carrying out this investigation, we studied a wide in AppendixA to define the variables β` ≡ U(αlate), range of alternative implementations in which alterna- βe ≡ U(αearly − αlate), and x0 ≡ log10 τc, where U(s) ≡ tive variables were used in the optimization. For exam- ln(exp(s) − 1) is the inverse of the softplus function (see ple, rather than using βe and β`, we have also explored Eq. A4). Thus each sub-population is described by an the use of logarithmic variables, and an alternative for- independently-defined mean, µi ≡ (βe, β`, x0), and co- mulation in which the varied parameters were restricted variance matrix, Σi, defined as to finite bounds via a sigmoid function. The specific choice for which parameters are varied in the optimiza- "σ(βe, βe) σ(βe, β`) σ(βe, x0)# tion amounts, in effect, to a choice for a prior distribution Σi ≡ σ(β`, βe) σ(β`, β`) σ(β`, x0) (B1) on the parameters. We found through considerable ex- σ(x0, βe) σ(x0, β`) σ(x0, x0) perimentation that this choice can have a significant im- pact on the distribution of best-fitting parameters, even By defining our halo population model in terms of though the quality of the individual fits themselves was the transformed variables βe, β`, and x0, we ensure typically unaffected by this choice. Since one of our pri- that all values of αearly, αlate, and τc in the support mary goals was to build a simple model for the distri- of P (αearly, αlate, τc|M0) will respect the physical con- bution of best-fitting parameters, then the choice for the straints described in §A.1, even in the tails of the distri- effective priors warranted special attention. Ultimately, bution. our decision on the parameterization was driven by two The relative weight of the two populations, Flate, has considerations: first, that the differentiable approxima- shallow, monotonic mass-dependence, so that Flate = tion to each simulated halo respects the physical con- Flate(M0). Each of the three components of µi, and straints defined above, and second, that the distribution each of the six components of Σi, also have a shallow, of best-fitting parameters admits a reasonably simple em- monotonic mass-dependence, so that µi = µi(M0), and pirical description. Σi = Σi(M0). For the remainder of this appendix, we will Amongst all the variations we explored for the effec- describe our models for Flate(M0), µi(M0) and Σi(M0) in tive priors, as well as amongst several variations in the turn. actual functional form used to fit the growth of individ- We model the mass-dependence of Flate(M0) as a ual halos, we found a clear bimodality in the halo pop- power-law function of present-day mass with rolling in- ulation. The manifestation of this bimodality of course dex. We capture this behavior using the same sig- changed from variation to variation, but in all cases, we moid functional form defined in Eq.2, so that Flate has consistently found two separate sub-populations within sigmoid-type dependence upon log10 M0. We calibrate the distribution of best-fitting parameters, roughly cor- the specific parameters of the sigmoid using the tech- responding to early-forming and late-forming regions of niques described in detail in AppendixC. parameter space. While these two sub-populations are We model the mass-dependence of each component of 16 Hearin et al.

5 µi in the same way we modeled Flate, i.e., as a power-law where the PDF in the integrand is given by function of M0 with rolling index, implemented based on P (αearly, αlate, τc) = (1 − Flate) · Pearly(αearly, αlate, τc) a sigmoid function of log10 M0. Again we relegate discus- sion of our determination of the specific parameters of + Flate · Plate(αearly, αlate, τc), (C2) these sigmoid functions to AppendixC. where Pearly and Plate are the two, separately-defined In order to model the components of each Σi, we make three-dimensional Gaussian distributions described in use of the Cholesky matrix decomposition. Briefly, for AppendixB. The calculation of the second moment of any real-valued, symmetric, positive-definite matrix, Σ, P (M (t)|M ) is similar to Eq. C1, only rather than there exists a unique, real-valued, lower-triangular ma- peak 0 Mpeak(t) appearing in the integrand, the PDF-weighted trix, L, that defines the Cholesky decomposition, writ- 2 ten as Σ = L · LT. The diagonal entries of L are strictly quantity is instead [Mpeak(t) − hMpeak(t)i] . positive with a product equal to the determinant of Σ. There are a number of approaches one could take to We use JAX to calculate L so that our computations computing these nested integrals. First, we note that our will be differentiable with autodiff, but many modern model for P (αearly, αlate, τc|M0) is completely analytic linear algebra libraries have efficient implementations of and smooth, without any complex oscillatory behavior or the Cholesky decomposition (for a modern review, see sharp turns in the parameter space, and so commonly- Higham 2009). used integration routines (e.g., Romberg 1955) would have no trouble giving a high-accuracy result. Alterna- In modeling each of the two 3 × 3 matrices, Σi, the quantities we parameterize are the 6 entries of the asso- tively, each of the component ingredients of our model for ciated Cholesky matrix: the PDF in the integrand of Eq. C1 have well-established numerical algorithms for generating stochastic realiza- tions of the distributions, and so it would be straightfor- ward to use a Monte Carlo approach to compute these in- "ai 0 0# L ≡ d b 0 (B2) tegrals via moments of randomly generated samples. Ei- i i i ther calculation could also be conveniently conducted in ei fi ci python using implementations in the scipy library. How- ever, we have instead opted for a third method based on weighted sampling that simplifies gradient calculations In fitting our model for P (Mpeak(t)|M0) as described in AppendixC, we formulate our parameterization in terms with autodiff. T Our autodiff-friendly method for calculating the inte- of log10 ai, log10 bi, and log10 ci to ensure that Σ = L · L will always be positive definite, but the parameters d , e , grals in Eq. C1 can be easily understood in terms of a i i simple toy example of a generic PDF convolution: and fi can take on any value on the real line and still pro- duce a valid covariance matrix, and so we use linear vari- Z ∞ X ables for our parameterization of the off-diagonal entries y = dxP (x)f(x) ≈ δxiP (xi)f(xi), (C3) −∞ of L. xi∈D

C. FITTING THE PARAMETERS OF THE MODEL FOR where we will think of the integrand as some smooth HALO POPULATION ASSEMBLY target function, f(x), weighted by a probability distri- bution, P (x), and where in the second equation we have Our goal in §4 is to identify a point in the parameter replaced the continuous integration with a finite sam- space of our model that can be used to generate a realis- pling of points spanning the domain of support, D. In tic distribution of individual, smooth trajectories of halo other words, to calculate the integral in Eq. C3, rather assembly, and at the same time results in an accurate than using an iterative algorithm such as Romberg(1955) reproduction of P (Mpeak(t)|M0) and P (dMpeak/dt|M0). that is challenging to implement in an autodiff library, To achieve these goals, our model for P (Mpeak(t)|M0) instead one can use simple Gaussian integration with a has numerous free parameters that required tuning in sufficiently dense sampling grid. This toy problem is of order to achieve the level of agreement shown in Fig- the same form as the problem at hand in the evaluation of ures7&8. In C.1, we review the general techniques we Eq. C1 for cases where P (x) = P (x|θ) and f(x) = f(x|θ), used for making differentiable predictions for the mean only with this formulation, we can readily see how sim- and variance of P (Mpeak(t)|M0) and P (dMpeak/dt|M0), ple it is to compute gradients of y with respect to model and in C.2 we describe how we optimized the parameters parameters, θ : of our best-fitting models. X ∂ ∂y/∂θ = δx P (x |θ)f(x |θ), (C4) i ∂θ i i xi∈D C.1. Differentiable predictions for halo population By implementing the model components described in Ap- assembly pendixB in an autodiff library such as JAX, evaluat- The first moment of our model prediction for ing expressions such as C4 is entirely straightforward P (Mpeak(t)|M0) is given by and does not require laboriously working out the ana- lytical result for each contribution to the summations. One need only ensure that the integration domain D ZZZ 5 hMpeak(t)|M0i ≡ dτcdαearlydαlateMpeak(t) Note that in Eq. C1 we have written Mpeak(t|αearly, αlate, τc) simply as Mpeak(t), and in Eq. C2 we have suppressed the condi- tioning upon M , e.g., F (M ) is written simply as F . × P (αearly, αlate, τc|M0), (C1) 0 late 0 late A Differentiable Model of Halo Assembly 17 is sufficient to span the range of non-negligible support spanning the range log10 M0 ∈ [11.75, 14.5] , with 0.25dex for P (αearly, αlate, τc|M0). We found this flexibility espe- separation. cially helpful during development phases of our model, In building our model for P (Mpeak(t)|M0), our goal when the need to conduct a range of modeling experi- is to faithfully recover the diversity of smooth assembly ments required many variations of the underlying func- histories for a population of halos with the same present- tional forms. In the following section, we describe how day mass, M0. As our model for individual halo assembly we carried out these calculations for the particular case does not explicitly account for contributions from merg- of the PDF convolutions that arise from optimizing our ers, for our target data we use each individual halo’s best- model predictions for the first and second moments of fitting Mpeak(t) and dMpeak/dt to define the target mean P (Mpeak(t)|M0) and P (dMpeak/dt|M0). and variance of P (Mpeak(t)|M0) and P (dMpeak/dt|M0). If we were to instead use the directly simulated histories C.2. Optimizing predictions for halo population to define the target one-point functions, then our model assembly for P (αearly, αlate, τc|M0) would instead converge to a re- In this section, we describe how we optimized the pa- sult with unrealistically large variance in the smooth tra- rameters of our model predictions for P (M (t)|M ) jectories traced by real halos. Using the directly simu- peak 0 lated values of the MAHs as target data would require and P (dMpeak/dt|M0), where the target distributions come from the merger trees of halos in gravity-only N- building an additional model component describing the body simulations. To generate our target data for a par- variance from mergers about the smooth trajectories, ticular bin of present-day halo mass, we begin by select- which is beyond our intended scope. The predictions of our model are controlled by the ing host halos in bin of Mpeak(z = 0) with width 0.1dex. One issue that arises in making accurate predictions for parametrized behavior of Flate(M0), µi(M0) and Σi(M0), the variance of halo histories is that in our model, the and optimizing these functions requires beginning with trajectory of each halo has exactly the same final mass, an initial guess for the parameters. In order to de- termine this initial guess, we directly inspect the dis- M0, by construction, whereas the bin of simulated halos will have a source of unwanted variance due to the finite tribution of best-fitting parameters describing the indi- width of the bin. To account for this effect, we rescale vidual trajectories of the halos in each target bin, M0. the history of each simulated halo according to its actual We use the Gaussian mixture model implemented in scikit-learn to decompose the binned halos into two value of Mpeak(z = 0) in the simulation before computing the target mean and variance. weighted sub-populations defined by βe, β`, and x0; we thus use to supply an estimate for the As mentioned in §4, when defining the target data scikit-learn fraction of later-forming halos in the bin, F¯late(M0), and for a particular bin of present-day halo mass, M0, we limit the impact of wild outliers in the merger the mean and covariance of each sub-population,µ ¯i(M0) ¯ trees by trimming 3σ outliers from the halo sample. and Σi(M0), respectively. We proceed in this fashion We conduct this outlier exclusion procedure as follows. and record each result for each bin of M0. Figure C11 For each snapshot of the simulation, t, we compute shows an example comparison between initial guess for the trimmed mean, µ (t|M ), and trimmed variance, the two-component model and the true distribution of p 0 12 σ (t|M ), using scipy.stats.mstats.trimmed mean parameters for halos with M0 = 10 M . p 0 ¯ ¯ and scipy.stats.mstats.trimmed variance, respec- Using the collection of Flate(M0), µ¯i(M0) and Σi(M0) tively; within the scipy library, these quantities are com- for each of our target bins of present-day halo mass, puted by rank-ordering the objects in the sample, com- we find that each of these quantities exhibits a shallow, puting the percentile of each object, p, excluding the ob- monotonic mass-dependence that is well-approximated jects lying outside the range (p, 100 − p), and computing by the 4-parameter sigmoid function defined in Eq.2. the mean and variance from the resulting population. At To define our initial guess for the best-fitting parameters of our halo population model, the parameters of each each snapshot, and for each bin of halo mass, we use µp sigmoid can simply be hand-tuned to give a reasonable and σp to compute the z-score, approximation to each quantity, and in all cases we found log10 Mpeak(t|M0) − µp(t|M0) acceptable approximations by holding the transition- zp(t|M0) ≡ . mass parameter fixed to x = 13.5, and the transition- σp(t|M0) 0 speed parameter fixed to k = 0.5. We use p = 0.01 to define each snapshot’s trimmed mean Using the procedure described above to define the ini- and variance, and at each snapshot over the redshift tial guess, we use the L-BFGS-B algorithm implemented range 0 < z < 3, we flag any halo with |zp(t|M0)| > 3. in scipy to optimize our model parameters. The loss We exclude any flagged halo from the sample whose mean function we minimize in this case is analogous to LMSE and variance define the target data we use to optimize used in AppendixA (see Eq. A2), only here we minimize our model for halo populations; this outlier exclusion the sum of the logarithmic differences between the col- procedure typically removes 2-3% of the halos in each lection of means and variances of P (Mpeak(t)|M0) and mass bin; by construction, this excludes halos that expe- P (dMpeak/dt|M0) tabulated for each of our target mass rience an unusually large major merger after z < 3; the bins. In searching for the best-fitting point, we do not function implementing this algorithm can be found in vary all four parameters of each sigmoid function of each the measure mahs.py module of the publicly available component, but only the two parameters controlling the diffmah source code. To compute the target data de- two asymptotic bounds on each sigmoid, holding the fined in this fashion, we use the BPL simulation for bins k and x0 parameters fixed to their hand-tuned values. 13.5 of present-day halo mass centered at M0 ≤ 10 M , Moreover, we do not allow the parameters to vary arbi- and MDPL2 for larger halo masses, choosing bin centers 18 Hearin et al.

Fig. C11.— Example comparison of the distribution of parameters for individual halo assembly. For halos of present-day 12 peak mass M0 = 10 M , the figure shows cross-sections of P (αearly, αlate, τc|M0). For each individual BPL halo in the mass bin, we approximate the assembly history of the halo with the rolling power-law model described in §3, and plot the distribution of parameters U(αearly),U(αearly − αlate), and log10 τc, where U(s) ≡ ln(exp(s) − 1). The orange ellipsoids show the earlier-forming sub-population (τc . 2 Gyr) of the two-component Gaussian distributions identified by scikit-learn, and the purple ellipses show the later-forming ¯ sub-population (τc & 2 Gyr). The colored ellipses are defined byµ ¯i and Σi that we use to define the initial guess for the parameters regulating our model prediction for P (Mpeak(t)|M0) and P (dMpeak/dt|M0). See text in C.2 for details.

Fig. C12.— Model for the mass-dependence of the distribution of parameters for individual halo assembly. The axes in each row of panels are the same as those described in Figure C11. Different colored ellipses correspond to normal distributions of halos with different present-day mass as indicated in the legend. The top row (bottom row) of panels shows distributions for the early-forming (later- forming) population of halos, where the mean and variance of each population was determined according to the optimization procedure described in C.2. trarily, but rather we only permit the L-BFGS-B algo- D. ASSEMBLY HISTORIES OF SUBHALOS, ORPHANS, rithm to search within a relatively narrow range of ∼ 30% AND SPLASHBACK-CENTRALS of the initial guess. We refer the reader to our pub- In the publicly available Rockstar catalogs described licly available source code for further quotidian details in §2, the column a first infall indicates the scale on the implementation of this optimization procedure. factor of the first snapshot where the (sub)halo was Figure C12 displays the results for the mass-dependence contained within the virial radius of some more mas- of the two-component normal distribution whose pre- sive halo. For a present-day host halo (as defined by dictions for P (Mpeak(t)|M0) and P (dMpeak/dt|M0) are upid=-1), the value of a first infall will be smaller shown in the main body of the paper. We refer the reader than unity if and only if the halo experienced a flyby to the rockstar pdf model.py of our source code for the event at some point in its past history. For the re- specific best-fitting values that result from this optimiza- sults in the main body of the paper, we defined our tion exercise. host halo samples by requiring upid=-1, without regard for the a first infall column. Figure D13 in this ap- pendix shows the residual errors of our model for indi- vidual growth for samples of flyby host halos defined by A Differentiable Model of Halo Assembly 19 upid=-1 and a first infall<1, using narrow bins of used on the gravity-only simulations studied in this pa- present-day peak halo mass as indicated by the in-panel per. annotation. Figure D14 in this appendix shows the anal- Figure E16 is analogous to Fig.3 in the main body ogous results for subhalos in the publicly available Rock- of the paper, but for the case of present-day subha- star catalogs in which upid6= −1. Finally, Figure D15 los in IllustrisTNG. Figure E17 demonstrates that the shows the analogous results for samples of subhalos that residual errors in approximating halo growth are uncor- were either merged or disrupted at some point prior to related with the large-scale density field in IllustrisTNG. z = 0. These subhalos do not appear in the publicly For IllustrisTNG, we have also repeated the halo pop- available Rockstar catalogs, but their histories can be ex- ulation model calibration exercise described in detail in tracted from the publicly available merger trees using the AppendixC. Making no changes to the form of any model extract orphan info.c program in the UniverseMachine components, and varying the same set of parameters, we source code6. simply optimized the model according to a different set of target data. Figures E18-E19 show that our model E. ASSEMBLY HISTORIES OF HALOS IN IllustrisTNG for halo population growth achieves a comparable level In this appendix, we illustrate the results of calcula- of accuracy for IllustrisTNG as for gravity-only simula- tions described in the main body of the paper, only here tions. shown for halos in IllustrisTNG. When analyzing the IllustrisTNG merger trees, we compute Mpeak(t) using This paper was built using the Open Journal of As- the total halo mass defined by the sum of dark matter, trophysics LATEX template. The OJA is a journal which stars and gas. We remind the reader that halos and provides fast and easy peer review for new papers in the merger trees in IllustrisTNG were identified by SUBFIND astro-ph section of the arXiv, making the reviewing pro- and SUBLINK, respectively, which are entirely different al- cess simpler for authors and referees alike. Learn more gorithms than the Rockstar and Consistent Trees codes at http://astro.theoj.org.

6 https://bitbucket.org/pbehroozi/universemachine/src/main/ 20 Hearin et al.

Fig. D13.— Same as Fig.3, but for host halos in BPL that experienced a flyby event in their past history.

Fig. D14.— Same as Fig.3, but for subhalos in BPL that survive to z = 0.

Fig. D15.— Same as Fig.3, but for subhalos in BPL that either merged or were disrupted prior to z = 0. A Differentiable Model of Halo Assembly 21

Fig. E16.— Same as Figure3, but for subhalos in the IllustrisTNG simulation.

Fig. E17.— Same as Figure4, but for the IllustrisTNG simulation.

Fig. E18.— Same as Figure6, but for the IllustrisTNG simulation. 22 Hearin et al.

Fig. E19.— Same as Figure7, but for the IllustrisTNG simulation.