arXiv:astro-ph/9602112v1 22 Feb 1996 r ucetysal(.. htteiiilfil ssufficien is field is initial the velociti It that initial (i.e., distribution. the small provided sufficiently density that, are hypothesis initial the the give on a based from at clumps directly virialized of epoch masses the of distribution the conditions. initial general evolut more the from into pape clustering insight non- this of qualitative of in least obtained evolution at conve- results provide the a the should study of as to Many serves case which clustering. Poisson linear with model the model Poisson for toy the as nient Thus, detailed Gaussian here. as general more not studied from is clustering conditions understand- of initial growth analytic the however, of present, ing At sug- otherwise. background microwave gest the in of spectrum fluctuations the temperature of our measurements in Indeed, Poisson. clustering th were gravitational likely verse it for thinks conditions one initial because the not is complete This initially a distribution. an son providing from clustering with gravitational concerned of description mainly is paper This INTRODUCTION 1 o.Nt .Ato.Soc. Astron. R. Not. Mon. 3Spebr2018 September 23 California of Univeristy Deptartment, Astronomy Berkeley Sheth K. Ravi of growth clustering the gravitational and processes branching Galton–Watson h rs–cehe prahalw n oestimate to one allows approach Press–Schechter The 000 0–0 00)Pitd2 etme 08(NL (MN 2018 September 23 Printed (0000) 000–000 , g:ter akmatter. dark – theory ogy: hs eut omr eea asiniiilcniin sdiscusse is conditions words: initial themselves. Key Gaussian particles general clustering more the to sh of results interrelations dynamics these these but the sense, All of statistical description a complete. o in de now well evolution works theory is approach gravitational Press–Schechter queueing distribution the and Poisson of process initially final description branching and the Press–Schechter initial using of the pairs obtained other also certain is for pair that given as a same for function the partition is h clum the merger epochs Press–Schechter of relation; Poisson function scaling partition interesting given the an any calculate to de of one complete allows history expression a merger particular provide In the to Pr clump. Press–Schechter used for given the any is of that also, history t formalism, merger means in developed theory formalism models the queueing using epidemic me The understood the process be also of branching can description description to descrip Galton–Wats complete detailed studied relation a a well The including provide the clustering, to hierarchical used to of is equivalent correspondence be fro This to process. clustering shown gravitational is of distribution description Press–Schechter The ABSTRACT h itiuinfnto fcut nrnol lcdcls safunc a as cells, placed randomly in counts of function distribution The aais lseig–glxe:eouin–glxe:frain–cosm – formation galaxies: – evolution galaxies: – clustering galaxies: Pois- ekly A94720 CA Berkeley, , uni- ion tly at es n r a ne rpriso h olna itiuin(rs & (Press distribution nonlinear Bond the 1974; field, of Schechter initial the properties in infer regions can overdense stud of by statistics Thus, the ev structures. ing will nonlinear field form density to collapse initial tually the in regions overdense cold), ilsis ticles ilshsbe eie Esen18;Seh19) The 1995). Sheth 1983; has clump (Epstein Press–Schechter Poisson a derived that probability been identical has of distribution ticles Poisson initially an from tering 1994). Cole & Lacey 1988; odareetwt h itiuino lm ie htare that relevant sizes in clump in be of measured to distribution the shown with been agreement have good thes fields for Gaussian scale-free functions initially mass spheric Press–Schechter that exactly seldom the show is Nevertheless, regions fields overdense random of collapse Gaussian the scale-free initially from as peial.Wieti a eago approximation good a col- mean, to be the may assumed in this are While regions spherically. overdense lapse Press–Schechter the the that of is assumptions approach simplifying main the of h rs–cehe xuso e ecito fclus- of description set excursion Press–Schechter The N bd iuain fgaiainlclustering gravitational of simulations -body A T E tl l v1.3) file style X N tal. et bd iuain (Efstathiou simulations -body 91 ae oe19) One 1993). Cole & Lacey 1991; antpoieadetailed a provide cannot naayi expression analytic an , niiilyPoisson initially an m lseigfo an from clustering f ino h evolution the of tion soytes tobeys It trees. istory sotie.This obtained. is p d. n a oextend to way One esuyo queues. of study he epochs. fiiiladfinal and initial of grhsoytree. history rger citos Thus, scriptions. cito fthe of scription ess–Schechter nbranching on ino time, of tion wwythe why ow N tal. et par- par- one en- ol- al. y- e 2 Ravi K. Sheth

− − (Nb)N 1e Nb interesting scaling relation. In Section 4, this scaling rela- η(N,b)= , (1) N! tion is exploited to provide insight into the details of the where N 1 and 0 b < 1, and b is related to the Press– growth of hierarchical clustering. Section 4 also shows that ≥ ≤ Schechter overdensity threshold δc: this Poisson Galton–Watson, Press–Schechter description is in qualitative agreement with the results of the numerical al- b = 1/(1 + δc). (2) gorithm developed by Kauffmann & White (1993). Section 5 discusses ways in which the extension For an initially cold , the threshold over- of the Press–Schechter approach that is developed in this density δ decreases as the Universe expands in such a way c paper can be extended to provide a detailed description of that b = 0 initially, and b 1 as the clustering develops. If → clustering from initially Gaussian random fields. Appendices clumps collapse spherically in a universe with critical den- A, B and C provide details of some of the calculations. sity, and both growing and decaying modes are present in the Appendix D contains a derivation of the distribution linear perturbation theory, then δ = (5/3) 1.69/a, where a c of counts in randomly placed cells by extending some of the is the expansion factor (e.g. Bond et al. 1991; Lacey & Cole ideas developed in this paper. Although all the results of this 1993). Thus, b changes rapidly at first, but at late times its paper are independent of those derived in Appendix D, it evolution is slowed by the expansion of the Universe. (See has been included because the final result in it (equation D3) section 2.3 in Sheth 1995 for another description of the evo- is known to be accurate. Thus, Appendix D provides one lution of b that shows this same behaviour.) Since b is known natural way in which the usual Press–Schechter analysis may to increase monotonically with time, it will be treated as a be extended to provide additional information about the psuedo-time variable in the remainder of this paper. evolution of nonlinear gravitational clustering. Equation (1) is known as a Borel distribution (Borel 1942). Epstein (1983) derived this distribution by studying the properties of level excursions of a Poisson distribution. His approach is the discrete analog of that which was used 2 A RELATION BETWEEN THE SPREAD OF later by Bond et al. (1991) in their analysis of the initially DISEASE AND THE PRESS–SCHECHTER Gaussian random fields. The Borel distribution can also be APPROACH derived using simple ‘cloud–in–cloud’ conditional probabili- ties for a Poisson distribution (Sheth 1995). This approach Consider a disease that is spread in accordance with the fol- is the discrete analog of Jedamzik’s (1995) treatment of the lowing model. Assume that, initially, there is a single carrier Gaussian case. of the disease. Assume that this carrier is capable of infect- If the probability that a randomly chosen clump con- ing others, and that the probability that it infects k others tains N particles is a Borel distribution, then the probabil- is given by a Poisson distribution. If the initial carrier is ity that a randomly chosen particle is in such a clump is thought of as belonging to the zeroth generation, then each Nη(N,b)/ N = (1 b)Nη(N,b), since the average num- of these newly infected carriers belongs to the first genera- h i − ber of particles in a Borel clump is N = 1/(1 b). In the tion. Assume that each of the members of the first generation h i − is, in turn, capable of infecting still others, who will make limit of large N and small δc, Stirling’s approximation for the factorial term implies that up the second generation. The members of the second gen- eration infect still others who make up the third generation, δ N N−1 e−N/(1+δc) (1 b) N η(N,b) = c who infect still others, and so on. Assume that, in any gener- − (1 + δc) 1+ δc (N 1)! ation, the probability that a carrier is able to infect k others −   2 is given by a Poisson distribution that is specified by the δc N δc exp (3) parameter, b, say. It is possible that, by chance, none of the → √2πN − 2   carriers in the nth generation infects any new members. In (Epstein 1983; Sheth 1995). The final expression is precisely this case the number of members in the (n+1)th generation that which obtains for a Gaussian density field with white is zero. If this should happen, the spread of the disease will noise initial fluctuations (e.g. Bond et al. 1991). This shows be said to be halted, and we can ask for the probability that that the Poisson distribution studied in the remainder of there were N people infected in total, including the initial this paper can be thought of as the discrete analog of the member in the zeroth generation. white noise Gaussian studied by Bond et al. (1991), and by This model for the spread of disease has been studied Lacey & Cole (1993, 1994). in some detail. It is a Galton–Watson branching process in Sections 2 and 3 study other derivations of the Borel which the distribution of the number of progeny of a given distribution. These derivations provide new insight into the member of each generation is a Poisson distribution with accuracy and applicability of the Press–Schechter approach. mean b < 1 (see, e.g., Harris 1963). This Galton–Watson An analytic expression that is, essentially, the partition func- process has an analytic solution. The probability that N tion that describes all possible merger histories of a given people were infected in total is given by a Borel distribution Press–Schechter clump is derived in Section 3. Its properties with parameter b (Otter 1949; Good 1960; Consul 1989). are consistent with those inferred previously (Sheth 1995). It is possible to use this Poisson Galton–Watson branch- In particular, it is consistent with the physical requirement ing process to study the growth of gravitational clustering that, in the limit of very small time steps, the probability from an initially Poisson distribution. Consider a randomly that a clump has two progenitors should be an infinitesi- chosen point in a Poisson distribution; this point comprises mal of smaller order than the probability that it has three, the zeroth generation. All points that are within a given or more, progenitors. Section 3 also shows that the parti- ‘contagious’ volume, say vc, around this point will be con- tion function, and so the growth of clustering, satisfies an sidered to be infected. Since the distribution is Poisson, the Galton–Watson and gravitational clustering 3

Figure 2. A branching process model of hierarchical clustering. The number of particles in each oval is a deter- mined by the (Poisson) initial conditions. The two ovals for each particle represent two different epochs, corresponding to two dif- Figure 1. A clump with six member particles identified in the ferent link lengths, or two different over-density thresholds, with initial particle distribution using the friends-of-friends percolation oval size increasing (over-density threshold decreasing) with time. model. In the percolation model, clumps merge with each other In this example, the clump of six particles was composed of four as the link length used to define friends-of-friends increases. The single particles and one pair at the earlier epoch. The fact that overlapping of volumes makes the percolation model difficult to different volumes are assumed to not overlap makes this branch- treat analytically. ing process analytically tractable. probability that this point is able to infect k others is Pois- is distinct from the Galton–Watson process described above. son, and the parameter that specifies this Poisson distribu- The percolation model as formulated here can be developed tion is related to the distance from the carrier out to which as an alternative model for the growth of clustering. In prin- the disease is contagious. For convenience, assume that vc ciple, friends–of–friends clump mass functions can be calcu- is defined so that the parameter of the Poisson distribu- lated from the initial distribution; they are functions of the tion is b =nv ¯ c wheren ¯ is the the average density. These k percolation link length (which is simply related to the size points make up the members of the first generation. How- of the ‘contagious’ volume vc), and are different from the ever, each member of the first generation will also have been clump mass functions determined using the Galton–Watson able to ‘infect’, say, j others, corresponding to the j neigh- model. Moreover, mergers are also well-defined in the per- bours that could have been (Poisson distributed) within vc colation model, as is the partition function of merger trees. from it. The set of all these neighbours of all the k mem- Thus, the percolation model is at least as well defined as the bers of the first generation makes up the second generation, Press–Schechter excursion set model. However, at present, and so on. It is clear that this situation is similar to the neither the distribution of friends–of–friends clump sizes, one described in the previous paragraph. This means that nor the associated merger probabilities can be calculated this Galton–Watson model for clustering from an initially analytically. Since the Galton–Watson model provides the Poisson distribution implies that the distribution of nonlin- same distribution of clump masses as do the excursion set, ear clump sizes is a Borel distribution. In other words, this or the cloud-in-cloud, analyses of the Press–Schechter ap- Galton–Watson model provides the same description of non- proach, the percolation model will not be considered further linear clustering as the better known Press–Schechter type in this paper. analyses described earlier. For this description to be exactly like the Galton– Watson process described above, we must assume that the probability that one of the members in the first generation 3 THE GALTON–WATSON DESCRIPTION OF HIERARCHICAL CLUSTERING has j neighbours within vc from it is independent of the fact that it is one of k particles within vc from the initial parti- The excursion set formulation of the Press–Schechter theory cle in the zeroth generation. This is the same as assuming shows clearly how to formulate and solve for a description that the volume vc centered on the initial particle does not of merging and hierarchical clustering (Bond et al. 1991; even partially overlap the volume vc centered on each of the Lacey & Cole 1993). Essentially, the merging problem re- k members in the first generation. Clearly, this assumption duces to solving a two-barrier problem that is analogous to is false; the assumption that the volumes never overlap is a the one-barrier level crossing problem that was solved to gross simplification. Nevertheless, we will continue consid- obtain the Press–Schechter multiplicity function. For an ini- ering this simplified model, since it provides the same mass tially Poisson distribution the two-barrier problem has also function as the Press–Schechter Borel distribution. been solved. The probability that, at the epoch b1, one of One can argue that the simplification which enabled us the progenitors of a clump with k particles at the epoch to pose the problem in terms of this Galton–Watson model b2 >b1, was of size j k, is easily related to the conditional ≤ suggests one way in which the Press–Schechter mass func- distribution tions could be modified. Accounting for the fact that it is j j−1 k j b1 b2 b1 possible for the volume vc centered on the initial particle f(j, b1 k,b2) = k k − | j k b2 b2 to partially overlap the volume vc centered on one of the k       k−j−1 members in the first generation, and so on, is the problem b2 b1 b1 known as friends–of–friends percolation. Clearly, percolation k − +(k j) (4), × b2 − b2 h   i 4 Ravi K. Sheth where b1

Appendix B shows that this partition function is con- that the head of the tree is always the initial male, and other sistent with the merger probabilities of equation (4). It also males start subfamilies within the tree. So, the number, m, provides additional insight into the origin of the various of subfamilies within a family tree having k members in total terms in equation (5). Appendix C describes a queueing pro- is equal to the number of male members in the tree. Now, the cess that provides another way to derive this expression. probability any given child is male is q = 1 p =(b2 b1)/b2. − − Since it is certain that one member of the tree is a male (the head of the family tree is always a male), the probability 3.1 Some properties of the partition structure that there are m males in a family with k members should − − be [(k 1)!/(k m)!(m 1)!] pk m qm 1. This is identical to Some limits of equation (5) are worth studying. As b2 − − − n1 nk → the Binomial distribution of equation (7). Alternatively, this b1, p(1 k k) 0, except when m = 1, for which 1 · · · | → expression for the number of subclumps of a k-sized clump, p(k k) 1 as required. As b1 0, then η(l,b1) 0 except | → → → for l = 1, so that which is the same as the number of sons in a k sized family, can be obtained directly from Good’s (1960) treatment of lim p(1n1 knk k) = p(1k k) the Galton–Watson process with many types of progeny. The b1→0 · · · | | brute force calculation is given in Appendix A. It, too, shows k−1 −kb2 (kb2) e 1 that n(m k) is given by equation (7). = = 1. (6) | k! η(k,b2) Equation (7) is also consistent with the physical re- quirement that, in the limit of very small time steps, the In other words, in the limit as b1 0, all progenitors are → certainly single particles, which is also the expected result. probability that a clump has two progenitors should be an The probability n(m k) that a clump of size k at the infinitesimal, the probability that the clump has three pro- | genitors should be an infinitesimal of the next higher order, epoch b2 has m progenitors at the epoch b1 is given by sum- ming p(1n1 knk k) over all distinct sets of m integers that and so on. It also implies that massive clumps form pref- · · · | erentially from massive progenitor clumps: on average, the add up to k. Now, any set specified by n1, , nk with { · · · } size, at some earlier epoch, of the largest progenitor clump of n1 + + nk = m can be written explicitly in terms of its m · · · k a clump at some given final epoch depends significantly on members as l1, lm . Thus, l1 + +lm = jnj = k. { · · · } · · · j=1 So, how massive the final clump is relative to the characteristic P mass at the final epoch (eq. 55 in Sheth 1995). n1 nk η(1,b1) η(k,b1) m! n(m k) = · · · In the limit of large k and large subclumps li, equa- | η(k,b2) n1! nk! tion (5) has an interesting interpretation. Stirling’s formula m parts. · · · X for the factorials implies that m−1 −k(b2−b1) 1 [k(b2 b1)] e − l1−1 lm−1 m−1 × m (m 1)! l1 lm k! k p(l1, ,lm k) = − n1 − − n · · · k−1 11 1 kk 1 k · · · | l1! lm! k n1! nk! = · · · k−m · · · m−1 1! · · · k! b1 b1 m parts.     1 X × b2 − b2 1n +··· +kn −n −···−n b 1 k 1 k m−1 k−m m−1 1 k  b1  b1 1n1b1+··· +knkb1 1 × e → n1! nk! b2 − b2 k! m! · · · 3 m  2   k−1 −kb √2πk 1 × (kb2) e 2 × n1! nk! m 3/2 − · · · × 2 m 1 −k(b2−b1) (2π) li 1 [k(b2 b1)] e i=1 − kY−m m−1 × m (m 1)! 1 b1 b1 − 1 k−m m−1 k! 1 b1 b1 → n1! nk! b2 − b2 · · · m = k−m 1     m! k b2 − b2 1 1 − − , (8)    l1 1 l2 1 lm−1 (m−1)/2 3/2 l l lm × (2πk) x 1 2 · · · i=1 i × l1!l2! lm! Y dist.perms. · · · X where we have defined xi li/k. The first term on the right k−m m−1 k−m ≡ k! 1 b1 b1 m k is the combinatorial term that is included to insure that each = k−m 1 m! k b2 − b2 k (k m)! distinct combination of subclumps is counted only once. The −  k−1−(m−1)  m−1 second set of terms accounts for the probability that there k 1 b1 b1 = − 1 . (7) are exactly m subclumps. The final product term is the most m 1 b2 − b2  −      interesting. It shows that, for a given value of the final clump The sum in the third equality is over all sets of m integers size k, and if m k, the case in which the subclumps are all ≪ which satisfy l1 + + lm = k, and over all the distinct approximately the same size is much less likely than the case · · · permutations of each set (which accounts for the multino- in which one of the subclumps is very much more massive mial factor m!/n1! nk!). The second from last equality than all the others. · · · follows from a combinatorial identity (note the similarity to The partition function (eq. 5) exhibits an interesting the Borel–Tanner distribution of equation B3; also see Sheth scaling. It depends on the initial and final epochs, b1 and & Saslaw 1994). The final expression for n(m k) is the same b2, only through the combination p = b1/b2. As one would | as that obtained previously (eq. 52 in Sheth 1995). expect, this is also true of the simpler statement given in It is easy to see that this expression is sensible. Referring equation (4). This means that the probability p(l1 ,lm k), · · · | back to the Galton–Watson family tree description, recall given the two epochs b1 and b2, will be the same as the 6 Ravi K. Sheth

′ ′ ′ probability p (l1 ,lm k), given b1 and b2, provided that ′ ′ · · · | b1/b2 = b1/b2 = p. In this sense, the clustering evolves in a self–similar fashion. This scaling can be exploited when comparing equation (5) with merger histories of clumps in N-body simulations. When b1 1 and b2 1, the ratio → → b1/b2 = (1+ δc2)/(1+ δc1) 1+ δc2 δc1. Thus, the scaling → − in b1/b2 corresponds to the scaling in (δc1 δc2) discussed, − for example, by Bond et al. (1991) and by Lacey & Cole (1993).

4 APPLICATIONS One example of the type of statistical question that we can Figure 3. Probability, f(l1|k), that the largest subclump of a k now answer is as follows. Suppose we are interested in the particle clump has l1 particles, for two choices of the ratio p = distribution of sizes of the largest progenitor clump at some b1/b2 of the initial and final epochs, and for two choices of k. epoch b1, of a given clump at the epoch b2. This might be Solid lines show the distribution for k = 10 with p = 0.3 and of interest, for example, in studies of the Butcher–Oemler 0.9. Dashed lines show the distribution for k = 50, with the same effect (Bower 1991; Kauffmann 1995). It is straightforward values of p. to compute the relevant sums over the partition function numerically. As an example, Fig. 3 shows the probability that the largest subclump of a clump with k particles has l1 particles, for two choices of k, and for two choices of the specifies this distribution of progenitor subclumps. So, we ratio p = b1/b2 of the initial and final epochs. can compute, e.g., the average size of the largest progeni- Three features of Fig. 3 are obvious. First, for given k, tor subclump. Now consider the primed clump. The scaling the curves depend strongly on p, the ratio of the two epochs property of the partition function shows that its subclump b1 and b2. As p increases, the curves peak at higher values distribution will be the same as that of the unprimed clump ′ ′ of l1/k. This means that the largest progenitor of a given at the epoch when b1 = pb2 < b1, where p = b1/b2 has the clump is a smaller fraction of the total mass as the time same value as for the unprimed clump. Consider the aver- between the final epoch and the epoch at which the pro- age size of the largest progenitor of these two clumps as a ′ genitors were identified increases. This simply reflects the function of ‘lookback time’ from the epochs b2 and b2. fact that the clustering is hierarchical; small clumps merge Since p is the same for the two clumps, they will have to form big clumps, and on average, clumps were smaller the same average size for the largest subclump at the epochs ′ in the more distant past than they are at present. Second, b1 and b1

Lacey C., Cole S., 1994, MNRAS, 271, 676 that the probability of having N members in the tree is LeGall J. F., 1993, Probab. Th. Rel. Fields, 96, 369 Neveu J., Pitman J., 1989, Lecture Notes Math, 1372, 248, P (N) = η(N,b2) − S´eminaire de Probabiliti´es XXIII. Springer–Verlag, Berlin N 1 n N−1−n N 1 b1 b1 Otter R., 1948, Ann. Math. Stat., 20, 206 − 1 Press W., Schechter P., 1974, ApJ, 187, 425 × n b2 − b2 n=0   Riordan J., 1979, Combinatorial Identities. Robert E. Krieger X     = η(N,b2), (A2) Publ. Co., New York Saslaw W. C., 1989, ApJ, 341, 588 where n denotes the number of girls in the tree. Clearly, this Saslaw W. C., Hamilton A. J. S., 1984, ApJ, 276, 13 is the same result as that given by equation (7) in the main Sheth R. K., 1995, MNRAS, 276, 796 text. Sheth R. K., Saslaw W. C., 1994, ApJ, 437, 35 Tanner J. C., 1953, Biometrika, 40, 58 Tanner J. C., 1961, Biometrika, 48, 222 APPENDIX B: MERGER PROBABILITIES AND THE PARTITION FUNCTION APPENDIX A: THE POISSON Some of the results of this Appendix were suggested by GALTON–WATSON PROCESS WITH TWO the recent work of Pitman (in preparation). This Appendix TYPES OF PROGENY shows explicitly that the partition function derived in the main text, equation (5) and the merger probabilities of equa- The Poisson Galton–Watson process with m different types tion (4) are consistent with each other. Demonstration of of progeny has been studied by Good (1960). He shows that this consistency is useful because, in the appropriate limit, if the branching process starts with ri individuals of type i, the merger rates implied by equation (4) are in good agree- for all 1 i m, then the probability that the whole tree ≤ ≤ ment with N-body simulations of gravitational clustering contains exactly n1 of type 1, n2 of type 2, and so on, is (Lacey & Cole 1994). For completeness, this limit is derived equal to the coefficient of below. In addition, an expression for the mean number of n1−r1 n2−r2 nm−rm z1 z2 zm j-sized subclumps of k-sized clumps is derived directly from · · · the partition function. The argument is generalized, at the in end of this Appendix, to obtain an expression for the facto- rial moments of this distribution. n1 nm ν zµ ∂fµ f1 fm δµ , Lacey & Cole (1993) define a merger rate by taking the · · · − fµ ∂zν limit as δ1 δ2 in f(k, δ2 j, δ1). That is, the merger rate is ν → | where δµ is the identity matrix and dP (j k δ) (1 b2) k η(k,b2) f(j, δ1 k, δ2) → | dδ lim − | z dδ ≡ δ1→δ2 (1 b1) j η(j, b1) fµ( )= exp aµν (1 zν ) − − − k δ (δ δ ) ν = lim 2 1 2 Y   − 2 δ1→δ2 (k j)! δ1 (1 + δ2) reflects the fact that the probability that an individual of − k−j−1 type µ has a child of type ν is a Poisson distribution with k j parameter aµν independently of the other individuals. × 1+ δ2 − 1+ δ1 j − k +  For our problem, m = 2, and subscript 1 is for girls e 1+δ2 1+δ1 and subscript 2 is for boys. Then a = a = b and a = × − 11 21 1 12 k dδ eδ(k j)/(1+δ) a22 = b2 b1, and the problem is to extract the relevant − − → √2π(k j)3/2 (1 + δ) (1 + δ)k j coefficients of − 3/2 − 2 − dδ k2 e δ (k j)/2 2 (B1) exp n1 b1(z1 1)+(b2 b1)(z2 1) → √2π (k j) k (1 + δ) ( − − −  −     (Sheth 1995). The third expression on the right follows from setting δ1 = δ2 = δ and δ1 δd = dδ, and considering the + n2 b1(z1 1)+(b2 b1)(z2 1) − − − − limit when k 1, j 1 and k j 1. Stirling’s approx- ) ≫ ≫ − ≫   imation for the factorials simplifies the expression consider- ably, and the final expression follows from assuming δ 1. 1 z b 1 z (b b ) z b z (b b ) . (A1) ≪ 1 1 2 2 1 2 1 1 2 1 Except for the (1 + δ) term in the denominator, the final × ( − − − − − ) h i h i expression is the same as the expression derived by Lacey Suppose that we are interested in the family tree that was & Cole (1993). In their notation S1 1/j, S2 1/k, and ∝ ∝ started by one male ancestor, in which there are N mem- ω = δ, since the relevant Gaussian corresponding to the bers of the tree in total. Then we can use equation (A1) Poisson is the white noise case. So, in the limit where Stir- to calculate the probability of having a tree with exactly N ling’s approximation for the factorials is valid, and when members. Let n denote the number of females in the tree. δ 1, equation (B1) here reduces to their equation (2.17). ≪ Then set n1 = n and n2 = N n and let r1 = 0 and r2 = 1. In this limit, the merger rate implied by equation (4) de- − The answer is obtained by summing up all the coefficients of scribes the simulations well (Lacey & Cole 1994). n N−n−1 terms of order z1 z2 . This involves straightforward but One other limit of equation (4) is also interesting. When tedious algebra which is not reproduced here. The result is k j, Stirling’s approximation for k! and (k j)! in equa- ≫ − 10 Ravi K. Sheth

tion (4) implies that = η(l1,b)η(l2,b) η(lm,b) · · · j−1 j−1 all m parts. b1 j b1 −j(b1/b2) X f(j, b1 k,b2) 1 e . (B2) n1 nk | → − b2 (j 1)! b2 = η(1,b) η(k,b) , (B5) − · · ·     all m parts. This shows that the probability that a randomly chosen X member of a Borel clump that has exactly k members at k where n1 + + nk = m, and l1 + + lm = j=1 j nj = k, the epoch b2 was in a Borel clump with j k particles at · · · · · · ≪ and the sum in the two final expressions is over all distinct the epoch b1 is, to an excellent approximation, given by a ordered partitions of k that have exactly mPparts. Notice Borel distribution with parameter b1/b2. When b2 1, this → that not all the terms in this sum are different. For exam- means that the probability that a particle is in a clump with ple, when m = 3 and k = 6, then the set 123 occurs six j 1 other particles is given by a Borel distribution with pa- { } − times, and when m = 3 and k = 7, then the set 223 oc- rameter b1. Since the limit b2 1 is equivalent to requiring { } → curs thrice. In general, a given set n1, , nk , will occur δ2 0, this is exactly what is required by the derivation { · · · } → m!/(n1! nk!) times. of f(j k) from the two barrier (excursion set) problem con- · · · | Let p(n1 nk m,k) denote the probability that a given sidered in Sheth (1995). Another application of Stirling’s · · · | set n1, , nk occured, given that there were exactly m approximation (to the j! term), with the limits b1 1 and { · · · } → terms which added up to k. Then p(n1 nk m,k) is given b2 1 shows that equation (4) is similar to the Lacey & · · · | → by summing up all the terms in equation (B5) correspond- Cole (1993) expression for the white noise Gaussian case. ing to it, and normalizing by P (b,Sm = k). If it is not Fig. 2 in Sheth (1995) also shows this to be true. certain that there were exactly m terms in the partition, Before deriving the merger probabilities f(j k) of equa- then we must multiply p(n1 nk m,k) by the probability | · · · | tion (4) directly from the partition function (equation 5), that there were exactly m terms that added up to k. Thus, it is useful to consider some combinatorial identities. First, p(n1 nk k), which is conditioned on k only and not on m · · · | consider the random variable X, and assume that the dis- as well, is given by an expression like tribution of X is Borel with parameter b. Then the distribu- η(1,b)n1 η(2,b)n2 η(k,b)nk m! tion of Sm = X1 + + Xm, where the Xi are independent · · · n(m k), · · · P (b,Sm = k) n1! nk! × | random variables, each drawn from a Borel distribution with · · · parameter b, is given by the Borel–Tanner distribution. That where n(m k) denotes the probability that k is the sum of ex- | is, the probability that Sm = k (i.e., the sum of m indepen- actly m integers. Now, Appendix A showed that the branch- dent Borel variables equals k) is ing process we are considering requires n(m k) to have the − − | m (kb)k m e kb Binomial distribution of equation (7). Thus, the probability P (b,Sm = k)= , where k m (B3) that a given set n1, , nk occurs is k (k m)! ≥ { · · · } − n1 nk η(1,b1) η(k,b1) m! (Tanner 1953, 1961). It is easy to see that this expression is p(n1 nk k) = · · · · · · | P (b1,Sm = k) n1! nk! sensible, since · · · k−m m−1 k 1 b1 b1 k−m+1 − 1 × m 1 b2 − b2 η(j, b) P (b,Sm−1 = k j)  −  − m−1 kb1   j=1 k! (b2 b1) e X = −k−1 k−m k−m+1 − − − − − m! b k (jb)j 1 e jb m 1 [(k j)b]k m e (k j)b 2 = − − k j! k j (k j m + 1)! m! nj j=1 − − − η(j, b1) . (B6) X − − × n1! nk! bk me kb · · · j=1 =(m 1) Y − (k m)! − Simple algebra shows that this final expression is equiva- k−m lent to equation (5). This shows explicitly how the partition k m − − − − − (1 + i)i 1 (k 1 i)k m i 1 function is related to sums of Borel-distributed random vari- × i − − i=0 ables. X   = P (b,Sm = k), (B4) We are finally in a position to show that equation (4) follows from equation (5). The probability that a particle as it should. The final expression follows from Abel’s gener- which is chosen at random from a clump with exactly k alization of the Binomial theorem (see equations 14 and 20 particles at the epoch b2 was in a subclump with j particles in section 1.5 of Riordan 1979 or equation 46 of Sheth 1995). at the epoch b1 is Iterating this process shows that j nj − p(n1, , nk k), k m+1 k · · · | P (b,Sm = k) = η(l1,b) P (b,Sm−1 = k l1) all part. − X l =1 X1 where the sum is over all partitions of k. This sum can be k−m+1 written as

= η(l1,b) n1 nj −1 nk × j η(1,b1) η(j, b1) η(k,b1) l1=1 η(j, b1) · · · · · · X k P (b1,Sm = k) − − all part. k l1 m+2 X η(l2,b) P (b,Sm−2 = k l1 l2) (m 1)! − − − m n(m k), (B7) l2=1 × n1! (nj 1)! nk! | X · · · − · · · Galton–Watson and gravitational clustering 11 which is the same as APPENDIX C: MERGING, BRANCHING, AND k−j+1 THE THEORY OF QUEUES j P (b1,Sm−1 = k j) η(j, b1) − m n(m,b1 k,b2),(B8) k P (b1,Sm = k) | The Borel distribution (eq. 1) also arises in studies of the m=1 X distribution of waiting times in queues (Borel 1942; Tan- where it is understood that when m = 1 then P (b1,Sm−1 = ner 1953; Tanner 1961). The fact that queueing theories k j) = P (b1,S0 = 0) = 1, and P (b1,S0 = k j) = 0 for and the Galton–Watson branching process are closely re- − − j = k. Writing all the terms in the sum explicitly gives lated (Kendall 1951) will be exploited in this Appendix. 6 k−j+1 Consider a counter at which customers are served. As- j−1 −jb1 k−j−m+1 −(k−j)b1 j (jb1) e m 1 [(k j)b1] e − − sume that customers arrive at the counter in a Poisson pro- k j! k j (k j m + 1)! m=1 − − − cess with parameter unity (an average of one arrival per X k−m m−1 unit time) and that the service time is the same constant, k (k m)! k 1 b1 b1 k−m −kb m − 1 0 b 1, for each customer. If the counter is busy when a × m (kb1) e 1 m 1 b2 − b2 ≤ ≤  −  customer arrives, the customer joins the back of a queue. So, −     k j+1 j j−1 service is on a first-come-first-served basis. In such a system, k j b1 b1 = k k 1 we can ask for the probability that exactly N customers are j k b2 − b2 × m=2       served before the queue is first emptied, given that there X m−2 k−j−m+1 k j 1 b1 b1 was only one customer in the queue initially. This probabil- − − k 1 (k j) . (B9) m 2 − b2 − b2 ity is given by the Borel distribution with parameter b (Borel  −  h  i h i 1942; Tanner 1953; Consul 1989). The relation of this queue The Binomial theorem reduces this final expression to equa- to the Galton–Watson branching process described above is tion (4). This shows explicitly that clear. Simply view the N customers that were served before

j nj the queue was first emptied as the descendents, the ones f(j, b1 k,b2)= p(n1, , nk k) (B10) | k · · · | who were ‘infected’ by, the initial customer. all part. X Now modify the queue system as follows. Assume, as as expected. Thus, the merger probabilities of equation (4) before, that customers arrive at a counter in a Poisson pro- can be derived directly from the partition function (equa- cess with unit rate, and that the service time is the same tion 5), which means that the merger probabilities and the constant, say b2, for each customer. However, in this case, partition function are mutually consistent. the customers form two queues in accordance with the fol- The steps leading to equation (B10) imply that the lowing prescription. If a new customer arrives within a time mean number of subclumps each having exactly j particles b1 b2 of the most recent commencement of service, they ≤ that are incorporated in a clump with exactly k particles is join the back of a high priority queue, H. If they arrive after later than b of the most recent commencement of ser- k 1 nj p(n1, , nk k)= f(j, b1 k,b2). (B11) vice, they join the back of a low priority queue, L. All cus- · · · | j | all part. tomers in queue H are serviced, in the order in which they X arrived, until the queue is empty. When this happens, the This is the same result as that obtained using a different first customer in queue L moves into queue H and is serviced argument (equation 45 in Sheth 1995). However, with the immediately. Thus, while customers within each queue are partition function, it is now possible to compute the higher serviced on a first-come-first-served basis, it is possible for order moments of this distribution as well. For completeness some customers in queue H to receive service before oth- the factorial moments are computed below. These higher ers in queue L, even though they may have arrived later. order moments are useful for estimating the scatter around In this sense, the modified system does not operate on a the mean number of j-clumps per k-clump. Also, they may strictly first-come-first-served basis. be good discriminators between different partition functions that yield the same f(j k) merger probabilities. The factorial For such a system, we can ask for the probability that | moments are exactly k customers are served before both queues are com- pletely emptied for the first time, given that there was only k−ij+i nj ! i P (b1,Sm−i = k ij) one customer in queue H and none in queue L initially. = η(j, b1) − (nj i)! P (b1,Sm = k) Clearly, the answer to this question is no different than be- − m=i+1   X fore: this probability must be given by the Borel distribu- m! n(m,b1 k,b2) tion with parameter b2. However, in the time before both × (m i)! | − queues were emptied for the first time, having serviced ex- j−1 i ij−i i k! (j ) b1 b1 actly k customers, queue H may have been emptied a num- = i k−1 k 1 j! (k ij)! k b2 − b2 ber of times (though certainly not more than k times). Sup- −   hk−ij−1 i pose that it was emptied m times. Define a batch of cus- b1 b1 k 1 +(k ij) . (B12) tomers as the number, l, of customers served between two × − b2 − b2 h   i successive empty periods of queue H. Then we can ask for Since the partition function is known, it is also straightfor- the probability, p(l1,l2, ,lm k), that queue H was emp- · · · | ward to compute ‘cross–correlation’ type moments of the tied m times, and that customers were served in batches form ninj , and the associated factorial moments, though of l1,l2, ,lm (not necessarily in that order), given that h i · · · we have not done so here. l1 + l2 + + lm = k, and that there was only one customer · · · in queue H initially. Then p(l1,l2, ,lm k) is the same as · · · | that defined in the Galton–Watson process considered ear- 12 Ravi K. Sheth lier in this paper. So, for this queue system, it is given by This way of deriving the partition function, by gener- equation (5). alizing Tanner’s (1961) argument, has another connection In the context of this queue system, equation (5), η(l,b) to previous work. In effect, it is an alternative derivation of is the Borel distribution with parameter b (eq. 1), and nj the excursion set scaling solution derived in section 3.2 of denotes the number of times exactly j customers passed Sheth (1995). This follows because Tanner showed how his through queue H before it was emptied, given that k passed queueing system could be formulated in terms of an excur- through it before both H and L were emptied. The first sion set process. Here we have described a queueing system term on the right of equation (5) accounts for the fact that that is associated with the Poisson Galton–Watson branch- the probability of serving exactly li customers between two ing process. Comparison of this queue with the excursion set successive empty periods in the H queue is a Borel distri- process considered in section 3.2 of Sheth (1995) shows that bution, and so it weights the occurence of each li in the they are equivalent. sequence of service batches with the probability, η(li,b1), that li occured. The second term on the right accounts for the different permutations of the different service sequences, APPENDIX D: COUNTS-IN-CELLS FROM THE since the various permutations of l1, ,lm all contribute { · · · } GALTON–WATSON BRANCHING PROCESS to the same p(l1,l2, ,lm k). The final term looks like a · · · | Poisson weighting term, and is obtained by an argument The branching process extension of the Press–Schechter ap- that is similar to that used by Tanner (1961) in his elegant proach is very powerful. In the main text it was used to derivation of the Borel distribution. provide a description of the merger history tree. However, Consider the case in which both queues are empty, hav- following a calculation suggested by Consul (1989), it is also ing serviced k customers since the last time they were both relatively straightforward to use the Galton–Watson branch- empty. If the H queue was emptied m times during this busy ing process to provide a derivation of the distribution of period, then m 1 customers must have passed through counts in randomly placed cells, at any epoch characterized − queue L during this time. So, we can ask for the probability by b. Assume that Press–Schechter Borel clumps collapse that m 1 customers arrived at and passed through queue completely to points, so that if the cluster center is included − L during this time. However, in Tanner’s (1961) language, in a cell, all associated particles are also. Now assume that not all possible arrival patterns are ‘admissible’, since the all clumps evolve in accordance with the Press–Schechter arrivals in queue L must be consistent with the pattern of description developed in the main text. service times in queue H. Namely, during each of the first The analogy with the Galton–Watson process is as fol- m 1 instants at which queue H is emptied, there must be lows. The number of particles in a given cell is the same as − at least one customer in queue L, but the mth time that H the total number of progeny of the Galton–Watson process. is emptied, queue L must also be empty. The problem is to However, unlike the Press–Schechter mass function consid- list all possible sequences of arrivals at queue L, and then ered in the main text, in this case the number of initial to determine the fraction of these that are admissible. ancestors, X0, is not unity. Rather, it is the number of clus- Recall that customers arrive in a Poisson process with ter centers, X0 = m, say, that happen to be in the cell. unit rate at a queueing system that has constant service So, for a cell containing m cluster centers, rather than con- time b2 1 per customer, and the first customer joins queue sidering the total number of progeny of one ancestor (the ≤ H without passing through queue L. If a customer arrives Borel distribution with parameter b), we need to calculate during the first fraction, b1/b2, of the service time, then they the total number of progeny given that there are X0 = m join queue H, otherwise they join queue L. If k customers ancestors in the zeroth generation. This is the same as the were served in total, then there were k opportunities for Poisson Galton–Watson process, conditioned on the num- customers to join queue L. Each of these k opportunities was ber of ancestors being X0 = m, rather than unity. Both of duration b2 b1. Since the arrival of customers is random, the branching process (Consul 1989) and the queueing the- − the probability that m 1 customers passed through queue L ory (Tanner 1953) formulations of this problem show that − during the time in which k customers were served is given by the probability that there are exactly N particles in the cell m−1 −k(b2−b1) the Poisson distribution, [k(b2 b1)] e /(m 1)!. given that there are exactly m cluster centers in the cell is − − However, the probability that the m 1 arrivals were in − − − m (Nb)N me Nb an admissible order introduces an additional factor of 1/m. P (N m)= , (D1) | N (N m)! This sets the final term in equation (5). − In terms of the Galton–Watson branching process con- which is the Borel–Tanner distribution of Appendix C. sidered in the main text (in which the probability that any When m = 1 it reduces to the Borel distribution of equa- parent has n1 daughters is a Poisson distribution with pa- tion (1). rameter b1, and the probability that that parent also had n2 The final key idea is to assume that, since the initial dis- sons is a Poisson distribution with parameter b2 b1), equa- tribution is Poisson, any randomly placed cell will include a − tion (5) lists the probability that a family of size k is made random number X0 of cluster centers. That is, the distribu- of the m subfamilies (each with a male at the head and only tion of X0 will be Poisson, with a parameter that is specified females in the subsequent generations), having l1,l2, ,lm by the epoch, labelled by b, at which the cell is placed, and · · · members in each. So, for the gravitational clustering process, by the size of the cell. This model for the number of cluster the probability that a clump of size k at the epoch b2 had centers in a randomly placed cell, given that the initial dis- the m progenitors l1,l2, ,lm at the epoch b1 is given by tribution was Poisson, is consistent with linear theory (e.g. · · · equation (5). As noted in section 3, equation (5) can be used Peebles 1980), and is motivated by a scaling argument that to generate the partition function of merger history trees. can be applied to the Poisson Press–Schechter description Galton–Watson and gravitational clustering 13

(section 3.2 in Sheth 1995). However, the scaling argument the clustered distribution, nor can one construct the nonlin- is equivalent to the interpretation of the ear counts in cells distribution function. In this respect, the Galton–Watson partition function of merger history trees Press–Schechter description does not provide a complete de- (see discussion at the end of Appendix C). Thus, the as- scription of nonlinear clustering. Therefore, it is very inter- sumption that the number of clusters in a randomly placed esting that numerical simulations of gravitational clustering cell is a Poisson random variable is consistent with the par- from an initially Poisson distribution confirm the accuracy tition structure derived earlier in this paper. of equation (D3) on all scales, as well as the scale dependence When the distribution of X0 is Poisson with parame- and temporal evolution of b (Sheth & Saslaw 1994 and ref- ter N¯cl, then the probability that any randomly placed cell erences therein). This measured accuracy of equation (D3), contains exactly N particles is and the way in which it can be derived from the branching process, suggests one way in which the Press–Schechter ap- ∞ ∞ − ¯ N¯ me Ncl f(N) = p(m) P (N m)= cl P (N m) proach may be extended to provide some information about | m! | the counts in cells distribution. m=0 m=0 X ∞ X Before concluding, we note that the partition structure N¯cl N 1 − − − ¯ − = − N¯ m 1 (Nb)N m e Ncl Nb derived in the main text is independent of the accuracy or N! m 1 cl applicability of the results of this Appendix. That is, the m=1  −  X results of the main text are independent of whether or not N¯ − − ¯ − = cl (N¯ + Nb)N 1 e Ncl Nb. (D2) the Borel clusters have a Poisson spatial distribution. N! cl

Setting N¯cl =nV ¯ (1 b), wheren ¯ is the average density of − particles and V is the size of the cell, is required by nor- malization. This shows that equation (D2) implies that the probability that a randomly placed cell of size V contains exactly N particles is

N−1 N¯(1 b) − ¯ − − f(N,V )= − N¯(1 b)+Nb e N(1 b) Nb, (D3) N! − h i where the left hand side now shows the volume dependence explicitly. Equation (D3) is also a solution to the Saslaw & Hamil- ton (1984) thermodynamic model of nonlinear gravitational clustering. It can be understood as describing a Poisson distribution of point sized clumps, where the probability a clump has N associated particles is given by a Borel distribution with parameter b (Saslaw 1989). Thus, equa- tions (1) and (D3) can be derived from a thermodynamic model (Saslaw & Hamilton 1984; Sheth 1995), from an anal- ysis of the excursion set statistics of overdense regions of a Poisson distribution (Sheth 1995), and from the Poisson Galton–Watson branching process (Consul 1989). In equation (D3), b is constant, independent of cell size V . This is a consequence of assuming that all clumps are point sized. Relaxing this assumption means that the branching process is no longer straightforward to implement. Nevertheless, we can use the Poisson cluster interpretation of this branching process derivation of f(N,V ) to gain some understanding of the shape of the counts in cells distribution when the point sized approximation is relaxed. The counts in cells distribution of a Poisson distribution of Borel (with parameter b) Press–Schechter clumps that have nontrivial sizes and shapes is well approximated by equation (D3), ex- cept that b becomes scale dependent. It tends to zero as V 0 and it tends to a constant value as V becomes larger → than the typical clump size (Sheth & Saslaw 1994). Whereas the usual Press–Schechter analysis of excur- sion set mass functions provides information about the dis- tribution of virialized clump masses, it does not provide information about the internal structure of these clumps, nor does it describe how these clumps are distributed rela- tive to each other in space. The clumps may be correlated with each other, or distributed uniformly at random. Thus, one cannot compute the N-point correlation functions of