Kinetic Theory of Random Graphs: from Paths to Cycles

1, 2, E. Ben-Naim ∗ and P. L. Krapivsky † 1Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 2Center for Polymer Studies and Department of Physics, Boston University, Boston, Massachusetts 02215 Structural properties of evolving random graphs are investigated. Treating linking as a dynamic aggregation process, rate equations for the distribution of node to node distances (paths) and of cycles are formulated and solved analytically. At the gelation point, the typical length of paths and cycles, l, scales with the component size k as l ∼ k1/2. Dynamic and finite-size scaling laws for the behavior at and near the gelation point are obtained. Finite-size scaling laws are verified using numerical simulations.

PACS numbers: 05.20.Dd, 02.10.Ox, 64.60.-i, 89.75.Hc

I. INTRODUCTION cycle length distribution. More subtle statistical proper- ties of cycles in random graphs can be calculated as well. A is a set of nodes that are randomly In particular, the probability that the system contains no joined by links. When there are sufficiently many links, cycles and the size distribution of the first, second, etc. a connected component containing a finite fraction of all cycles are obtained analytically. nodes, the so-called giant component, emerges. Random We focus on the behavior near and at the phase tran- graphs, with varying flavors, arise naturally in statisti- sition point, namely, when the gel forms. We show that cal physics, chemical physics, combinatorics, probability the path and the cycle length distribution approach self- theory, and computer science [1–5]. similar distributions near the gelation transition. At the Several physical processes and algorithmic problems gelation point, these distributions develop algebraic tails. are essentially equivalent to random graphs. In gela- The exact results obtained for an infinite system allow tion, monomers form polymers via chemical bonds un- us to deduce scaling laws for finite systems. Using heuris- til a giant polymer network, a “gel”, emerges. Identify- tic and extreme statistics arguments, the size of the giant ing monomers with nodes and chemical bonds with links component at the gelation point is obtained. This size shows that gelation is equivalent to the emergence a gi- scale characterizes the size distribution of components ant component [6–8]. A random graph is also the most and it leads to a number of scaling laws for the typical natural mean-field model of [9, 10]. In com- path size and cycle size. Extensive numerical simulations puter science, satisfiability, in its simplest form, maps validate these scaling laws for finite systems. onto a random graph [11]. Additionally, random graphs The rest of the paper is organized as follows. First, the are used to model social networks [12, 13]. evolving random graph process is introduced (Sec. II), Random graphs have been analyzed largely using com- and then the size distribution of all components is ana- binatorial and probabilistic methods [3–5]. An alterna- lyzed in Sec. III. Statistical properties of paths are de- tive statistical physics methodology is kinetic theory, or rived in Sec. IV and then used to obtain statistical prop- equivalently, the rate equation approach. The formation erties of all cycles (Sec. V) and of the first cycle (Sec. VI). of connected components from disconnected nodes can be We conclude in Sec. VII. Finally, in an appendix, some treated as a dynamic aggregation process [14–17]. This details of contour integration used in the body of the kinetic approach was used to derive primarily the size paper are presented. distribution of components [18–20]. Recently, we have shown that structural characteristics II. EVOLVING RANDOM GRAPHS of random graphs can be analyzed using the rate equation approach [21]. In this study, we present a comprehen- sive treatment of paths and cycles in evolving random A graph is a collection of nodes joined by links. In graphs. The rate equation approach is formulated by a random graph, links are placed randomly. Random treating linking as a dynamic aggregation process. This graphs may be realized in a number of ways. The links approach allows an analytic calculation of the path length may be generated instantaneously (static graph) or se- distribution. Since a cycle is formed when two connected quentially (evolving graph); additionally a given pair of nodes are linked, the path length distribution yields the nodes may be connected by at most a single link (simple graph) or by multiple links (multi-graph). We consider the following version of the random graph model. Initially, there are N disconnected nodes. Then, ∗Electronic address: [email protected] a pair of nodes is selected at random and a link is placed †Electronic address: [email protected] between them (Fig. 1). This linking process continues 2

finite random graphs, both Nk(t) and ck(t) are random variables, but in the N limit the density ck(t) be- comes a deterministic quan→ ∞tity. It evolves according to the nonlinear rate equation (the explicit time dependence is dropped for simplicity) dc 1 k = (ic )(jc ) k c . (2) dt 2 i j − k i+Xj=k FIG. 1: An evolving random graph. Links are indicated by The initial condition is ck(0) = δk,1. The gain term ac- solid lines and the newly added link by a dashed line. counts for components generated by joining two smaller components whose sizes sum up to k. The second term on the right-hand side of Eq. (2) represents loss due to ad infinitum and it creates an evolving random graph. linking of components of size k to other components. The The process is realized dynamically. Links are generated corresponding gain and loss rates follow from the aggre- with a constant rate in time, set equal to (2N) 1 without − gation rule (1). loss of generality. There are no restrictions associated The rate equations can be solved using a number of with the identity of the two nodes. A pair of nodes may techniques. Throughout this investigation, we use a con- be selected multiple times, i.e., a multi-graph is created. venient method in which the time dependence is elimi- Additionally, the two nodes need not be different, so self- nated first. Solving the rate equations recursively yields connections are allowed. t 1 2t 1 2 3t c1 = e− , c2 = 2 te− , c3 = 2 t e− , etc. These explicit At time t, the total number of links is on average Nt/2, k 1 kt the average number of links per node (the degree) is t, results suggest that ck(t) = Ck t − e− . Substituting and the average number of self-connections per node is this form into (2), we find that the coefficients Ck satisfy 1 the recursion relation N − t/2. Therefore, whether or not self-connections are allowed is a secondary issue. Since the linking process is 1 (k 1) C = (iC ) (jC ) (3) completely random, the is Poissonian − k 2 i j with a mean equal to t. i+Xj=k subject to C1 = 1. This recursion is solved using the generating function approach. The form of the right- III. COMPONENTS hand side of Eq. (3) suggests to utilize the generat- ing function of the sequence kCk rather than Ck, i.e., The evolving random graph model has several virtues kz kz G(z) = k k Ck e . Multiplying Eq. (3) by k e and that simplify the analysis. First, the linking process summing over all k, we find that the generating function is completely random as there is no memory of previ- satisfiesPthe nonlinear ordinary differential equation ous links. Second, having at hand a continuous variable dG (time) allows us to use continuum methods, particularly (1 G) = G. (4) the rate equation approach. This is best demonstrated − dz by determination of the size distribution of connected Integrating this equation, z = ln G G + A and using components. the asymptotics G ez as z −fixes the constant As linking proceeds, connected components form. A = 0. Thus, we arriv→ e at an→implicit−∞ solution for the When a link is placed between two distinct components, generating function the two components join. For example, the latest link G z in Fig. 1 joins two components of size i = 2 and j = 4 G e− = e . (5) into a component of size k = i + j = 6. Generally, there are i j ways to join disconnected components. Hence, The coefficients Ck can be extracted from (5) via the comp×onents undergo the following aggregation process Lagrange inversion formula, or using contour integration as detailed in Appendix A. Substituting r = 1 in Eq. (A1) ij/2N kk−2 (i, j) i + j. (1) yields Ck = k! reproducing the well-known result for −→ the size distribution [18, 19] Two components aggregate with a rate proportional to k 2 the product of their sizes. k − k 1 kt c (t) = t − e− . (6) k k! In the following, we shall often use the generating func- A. Infinite Random Graph kz tion for the size distribution c(z, t) = k k ck(t)e . This generating function is readily expressed via the auxiliary Let ck(t) be the density of components containing k Pkz generating function G(z) = k k Ck e : nodes at time t. In terms of Nk(t), the total number of 1 components with k nodes, then c (t) = N (t)/N. For c(z, t) = t− GP(z + ln t t). (7) k k − 3

Let us consider the fraction of nodes in finite compo- This is conveniently seen via the cumulative size distri- nents, M1 = k k ck(t). This quantity is merely the bution. The size of the largest component in the sys- first moment of the size distribution (hence the nota- tem, kg, is estimated from the extreme statistics crite- P tion). Equivalently M1 = c(z = 0, t). From (7) we find rion, N k k ck 1, to be ≥ g ∼ M1 = τ/t with τ = G(ln t t). Using (5), we express τ − P through t: k N 2/3. (11) g ∼ τ t τe− = te− . (8) The largest component in the system grows sub-linearly with the system size [3]. The time by which this com- For t < 1, there is a single root τ = t, and all nodes reside ponent emerges approaches unity for large enough sys- in finite components, M1 = 1. For t > 1 the physical root tems as follows from the diverging characteristic size scale 2 satisfies τ < t and only a fraction of the nodes resides k (1 t )− , g ∼ − g in finite components, M1 < 1. Thus, at time t = 1, 1/3 the system undergoes a gelation transition with a finite 1 tg N − . (12) fraction of the nodes contained in infinite components. − ∼ We term this time the gelation time, tg = 1. In the late The maximal component size (11) underlies the entire t stages of the evolution t 1, one has τ te− and size distribution. Let ck(N, t) be the size distribution in t À ' M c = e− , so the system consists of a single giant a system of size N at time t. At the gelation point, the 1 ' 1 component and a small number of isolated nodes. size distribution ck(N) ck(N, t = 1) obeys the finite- ≡ The behavior at and near the transition point are of size scaling form (Figs. 2 and 3) special interest. The critical behavior of the component 5/3 2/3 size distribution is echoed by other quantities as will be ck(N) N − Ψc(kN − ). (13) ∼ shown below. Size distributions become algebraic near the critical point. Moreover, there is a self-similar be- The scaling function has the following extremal behaviors havior as a function of time (dynamical scaling) and as 1/2 5/2 a function of the system size (finite-size scaling). (2π)− ξ− ξ 1; Ψc(ξ) ¿ (14) At the gelation point, the component size distribution ' exp( ξγ ) ξ 1. ( − À has an algebraic large-size tail, obtained using the Stir- ling formula, The small-ξ behavior corresponds to sizes well below the characteristic size and thus reflects the infinite system 5/2 c C k− . (9) behavior (9). The large-ξ behavior was obtained numer- k ' ically with γ ∼= 3. To appreciate the large-ξ asymptotic, 1/2 with C = (2π)− . [Throughout this paper, bold letters let us estimate the probability that the system managed are used for critical distributions, so ck ck(t = 1).] In to generate the largest possible component of size N/2 the vicinity of the gelation time, the size≡distribution is at time t = 1. The lower bound for this probability can 5 2 self-similar, ck(t) (1 t) Φc k(1 t) with the scaling be established via a “greedy” evolution which assumes function → − − that after k linking events the graph is composed of a ¡ ¢ tree of size k +1 and N k 1 disconnected nodes. Such 1/2 5/2 − − Φ (ξ) = (2π)− ξ− exp( ξ/2). (10) evolution occurs with probability c − 2 N 2 3 N 3 N N/2 N/2 N! Thus, the characteristic component size diverges near the − − . . . − , 2 N gelation point, k (1 t)− . N · N × N · N × × N · N ∼ N ∼ − N that scales as e− . While this lower bound is not nec- essarily optimal, it suggests that the actual probability B. Finite Random Graphs 2/3 is exponentially small. The scaling variable ξ = kN − becomes ξ N 1/3 for k = N/2, so exp( N γ/3) matches In the previous subsection, we applied kinetic theory the probabilit∼ y exp( N) when γ = 3. − to an infinite system. This approach can be extended To check the critical− behavior in finite systems, we per- to finite systems. Unfortunately, such treatments are formed numerical simulations. In the simulations, N/2 very cumbersome [22, 23]. Since the number of compo- links are placed randomly and sequentially among the N nents is finite, the fluctuations are no longer negligible, nodes as follows. A node is drawn randomly, and then and instead of a deterministic rate equation approach, a another node is drawn randomly. Last, these two nodes stochastic approach is needed. Here we follow an alter- are linked. Self-connections are therefore allowed. The native path, employing the exact infinite system results simulations differ slightly from the above random graph in conjunction with scaling and extreme statistics argu- model in that the number of links is not a stochastic vari- ments. able. For large N, this simulation is faithful to the evolv- The characteristic size of components at the gelation ing random graph model because the number of links is point exhibits nontrivial dependence on the system size. self-averaging. 4

100 IV. PATHS N=103 4 N=10 Structural characteristics of components can be inves- N=105 6 tigated in a similar fashion. By definition, every two 10−5 N=10 nodes in a component are connected. In other words, Eq. (9) there is a path consisting of adjacent links between two such nodes. We investigate statistical properties of paths (N) k

c in components. Characterization of paths yields useful 10−10 information regarding the connectivity of components as well as internal structures such as cycles. For every node in the graph, there are (generally) mul- tiple paths that connect it with all other nodes in the −15 respective component. With new links, new paths are 10 100 101 102 103 104 105 formed. For every pair of paths of lengths n and m orig- k inating at two separate nodes, a new path is formed as follows

FIG. 2: The size distribution for a finite system at the gelation n, m n + m + 1. (15) −→ point. Shown is ck(N) versus k for various N. The infinite system behavior is shown for reference. The data represents In Fig. 1, linking two paths of respective lengths n = 1 an average over 106 independent realizations. and m = 2 generates a path of length n + m + 1 = 4. Thus, paths also undergo an aggregation process. How- 2 ever, this aggregation process is simpler than (1) because 3 the aggregation rate is independent of the path length. N=10 4 distinct N=10 Let ql(t) be the density of paths containing l 5 links at time t. By distinct we mean that the two paths 1.5 N=10 N=106 connecting two nodes are counted separately. By defini- tion, q0(t) = 1. The rest of the densities grow according (ξ) c to the rate equation Ψ 1 1/2 )

5 dql = qnqm (16) πξ dt n+m=l 1 (2 0.5 X − for l > 0. The initial condition is ql(0) = δl,0. This rate equation reflects the uniform aggregation rate. Another 0 notable feature is the lack of a loss term — once a path 0 1 2 3 4 is created, it remains forever. Solving recursively gives 2 ξ q1 = t, q2 = t , etc. By induction, the path length density is

l FIG. 3: Finite-size scaling of the size distribution. Shown ql(t) = t . (17) 5 1/2 is (2πξ ) Ψc(ξ) versus ξ, obtained from simulations with various N. Indeed, this expression satisfies both the rate equation and the initial condition. The first quantity q1 = t is consistent with the facts that the link density is equal to The simulation results are consistent with the pos- t/2 and that every link corresponds to two distinct paths tulated finite-size scaling form (13). We note that the of length one. scaling function Ψc(ξ) converges slowly as a function The above path density represents an aggregate over of N. The simulations reveal an interesting behavior all nodes and all components. Characterization of path of the finite-size scaling function. The function ck(N) statistics in a component of a given size is achieved via has a “shoulder” — a non-monotonic behavior compared pl,k, the density of paths of length l in components of with the pure algebraic behavior (9) characterizing infi- size k. Note the obvious length bounds 0 l k 1 2 ≤ ≤ − nite systems (Fig. 2). The properly normalized scaling and the sum rule l pl,k = k ck reflecting that there are 5 1/2 2 function (2πξ ) Ψc(ξ) is a non-monotonic function of k distinct paths in a component of size k (every pair of ξ (Fig. 3). Obtaining the full functional form of the scal- nodes is connected).P The density of the linkless paths is ing function Ψc(ξ) remains a challenge. A very similar p0,k = kck, because kck is the probability that a node shoulder has been observed for the degree distribution of belongs to a component of size k. finite random networks generated by preferential attach- We have seen that components and paths form via ment [24–27]. the aggregation processes (1) and (15), respectively. 5

The joint distribution pl,k therefore undergoes a bi- As a result, the density of paths of length l in components aggregation process [28]. In the present case, of size k is k l 2 (n, i) + (m, j) (n + m + 1, i + j) (18) k − − k 1 kt p = (l + 1) t − e− . (24) −→ l,k (k l 1)! where the first index corresponds to the path length and − − the second to the component size. The joint distribution Comparing (24) and (6) we notice that the densi- evolves according to the rate equation ties of the two shortest paths satisfy p0,k = kck and p1,k = 2(k 1)ck. The latter reflects that there are k 1 dp − − l,k = p p + (ip )(jc ) kp . (19) links in a tree of size k and that with unit probability all dt n,i m,j l,i j − l,k components are trees (as discussed in the next section). i+j=k i+j=k n+mX=l 1 X Note also that the longest possible path, l = k 1, cor- − responds to linear (chain-like) components. According− to The initial conditions are p (0) = δ δ . The first k 1 kt l,k k,1 l,0 Eq. (24), the density of such paths is pk 1,k = t − e− . term on the right-hand side of Eq. (19) describes newly This density decays exponentially with −length, so these formed paths due to linking. The last two terms cor- components are typically small, their length being of the respond to paths that do not contain the newly placed order one. link. The path length density can be simplified in the large We now repeat the steps used to determine the size k-limit by considering the properly normalized ratio of distribution. The time dependence is eliminated using factorials k 1 kt the ansatz pl,k = Pl,kt − e− . The corresponding coef- l 1 ficients Pl,k satisfy the recursion k! − j = 1 kl (k l)! − k j=1 µ ¶ (k 1)Pl,k = Pn,iPm,j + (iPl,i)(jCj ). (20) − Y − l l i+Xj=k i+Xj=k j 1 j2 n+m=l 1 = exp + . . . − − k 2 k2 −  kz j=1 j=1 The generating function Pl(z) = k Pl,ke satisfies the X X dP  2  recursion relation (1 G) l = P P + P exp( l /2k) . dz n+m=l 1 n m l ' − for l > 0. Dividing this−equation Pby (4) yields− P Using the Stirling formula, in the limits k 1 and l 1, À À dPl the path density becomes G = PnPm + Pl (21) dG 3 1/2 k 1 k(1 t) l2/2k n+m=l 1 p l (2πk )− t − e − e− . (25) X − l,k ' for l > 0. As noted above P0,k = kCk, so P0(z) = G(z). As was the case for the component size distribution, the 2 3 Solving Eq. (21) recursively gives P1 = G , P2 = G , etc. path length density is self-similar in the vicinity of the In general, gelation point, p (1 t)2Φ k(1 t)2, l(1 t) , with l,k → − p − − l+1 the scaling function Pl(z) = G (z). (22) ¡ ¢ 3 1/2 2 Φp(ξ, η) = η (2πξ )− exp( η /2ξ). (26) This solution can be validated directly. The time depen- − kz dent generating function pl(z) = k pl,ke is therefore Thus, the characteristic path length diverges near the 1 l+1 1 pl(z) = t− G (z+ln t t). The total density of paths of gelation point, l (1 t)− . l− P ∼ − length l, pl(z = 0) = t , coincides with (17) prior to the At the critical point, the path length density becomes gelation transition (t < 1) because all components are 3 1/2 2 p l(2πk )− exp( l /2k). (27) finite. However, the total number of paths is reduced, l,k ' − 1 l+1 pl(z = 0) = t− τ , past the gelation time (t > 1). It is evident that the typical path length scales as square One may also obtain the bivariate generating function l kz root of the component size p(z, w) = l,k pl,kw e . Using (22) one gets l k1/2. (28) P ∼ 1 G(z + ln t t) p(z, w) = t− − . (23) 1 wG(z + ln t t) For finite systems, the scaling law for the typical path − − length (28) combined with the characteristic component The total density of paths in finite components is of size (11) leads to the following characteristic path length course g = pl,k, so g p(z = 0, w = 1). Gener- l,k 1/3 ally, g = τ ; for t < 1 ≡the total density of paths is l N . (29) t(1 τ) ∼ P− 1 g(t) = (1 t)− . − One can deduce several other scaling laws and finite-size The coefficients are found via the contour integration scaling functions underlying the path density. For exam- P = (2πi) 1 dy P y k 1 (see Appendix A). Substi- 1/3 l,k − l − − ple, substituting the gelation time 1 tg N − into k−l−2 k −1 ∼ 1/3 tuting r = l + 1 in Eq. (A1) yields Pl,k = (l + 1) (k l 1)! . the total number of paths g = (1 t)− yields g N . H − − − ∼ 6

V. CYCLES In particular, at the gelation point, the cycle length dis- tribution is inversely proportional to the cycle length [5] Each component has a certain number of nodes and w = (2l) 1. (32) links. The complexity of a component is defined as the l − number of links minus the number of nodes. Components This result can alternatively be obtained using combina- with complexity 1 are trees; components with complex- − torics. ity 0 and 1 are termed unicyclic and bicyclic correspond- To characterize cycles in a given component size, we ingly. Finite components are predominantly trees. We consider the joint distribution ul,k, the average number have seen that the overall number of links is proportional of unicyclic components of size k containing a cycle of to N and that the overall the number of self-links is of length l with 1 l k. This joint distribution evolves the order unity. The overall numbers of trees and of uni- according to the≤line≤ar rate equation cyclic components mirror this behavior. Generally, the number of components of complexity R is proportional dul,k 1 R = pl 1,k + (iul,i) (jcj ) k ul,k (33) to N − (this result is well-known, see e.g. [5, 21] and es- dt 2 − − pecially [29]). Therefore, it suffices to characterize trees i+Xj=k and unicyclic components only. for l 1. Initially there are no cycles, and therefore ≥ Each unicyclic component contains a single cycle. Cy- ul,k(0) = 0. Eliminating the time dependence via the k kt cles are an important characteristic of a graph [30, 31]. In substitution ul,k = Ul,kt e− , the coefficients satisfy the this section, we analyze cycles and unicyclic components recursion using the rate equation approach. We first note that cy- 1 cles in random graphs were also studied using various k Ul,k = Pl 1,k + (iUl,i) (jCj ). (34) 2 − other approaches: Janson [32, 33] employs probabilistic i+j=k and combinatorial techniques; Marinari and Monasson X kz [31] assign an Ising spin to each node and deduce cer- Using the generating function Ul(z) = k e Ul,k tain properties of loops from the partition function of this recursion is recast into the differential equation et al dUl 1 P the Ising model; Burda [34] modify a random graph (1 G) dz = 2 Pl 1. Dividing by (4), we obtain model to favor the creation of short cycles, and examine − − the model using a diagrammatic technique. A number of dUl 1 l 1 = G − . (35) authors also studied cycles on information networks like dG 2 the Internet (see [35] and references therein). Integrating this equation yields the generating function 1 U (z) = Gl(z). (36) A. Infinite System l 2l

There is a significant difference between the distribu- Consequently, the cycle length distribution (in finite com- τ l tion of trees and unicyclic components. In the thermo- ponents only) is pl = 2l , in agreement with (31) prior to dynamic limit, the number of trees is extensive and as a the gelation time (t < 1). result, it is a deterministic, or a self-averaging quantity. Additionally, the joint generating function defined as kz l The number of unicyclic components is not extensive, u(z, w) = l,k e w ul,k is given by but rather of the order unity; as a result it is a random P 1 1 quantity with a nontrivial distribution even for infinite u(z, w) = ln . (37) random graphs. In what follows, we study the average 2 1 wG(z + ln t t) − − number of unicyclic components of a given size or cycle As for paths, statistics of cycles are directly coupled to length. statistics of components via the generating function G(z). The average number of cycles follows directly from the The total number of unicyclic components of finite-size path length density. Quite simply, when the two extremal h = u is therefore nodes in a path are linked, a cycle is born. Let the num- l,k l,k ber of cycles of size l at time t be wl(t). It grows according P 1 1 to the rate equation h(t) = ln . (38) 2 1 τ dw 1 − l 1 1 = ql 1. (30) Below the gelation point, h(t) = ln , for t < 1. The dt 2 − 2 1 t total number of unicyclic components−can alternatively The right-hand side equals the link creation rate 1/(2N) be obtained by noting that (i) it satisfies the rate equa- times the total number of paths Nql 1; indeed, the total 1 2 1 − tion dh/dt = 2 k k ck = 2 M2, and (ii) the second number of cycles of a given length is of the order one. 1 moment of the size distribution is M2 = (1 t)− for The cycle length distribution is t < 1 as follows fromP (7). − tl The coefficients underlying the cycle distribution w = . (31) l 2l are found using contour integration. Indeed, writing 7

1 k 1 Ul,k = (2πi)− Ul y− − dy and substituting r = l in 3 k−l−1 (A1) gives U = 1 k [4]. The cycle length-size dis- l,kH 2 (k l)! − 2.5 simulation tribution is therefore [ln N]/6 k l 1 1 k − − k kt 2 u (t) = t e− . (39) l,k 2 (k l)! −

h 1.5 The smallest cycle, l = 1, is a self-connection, and the av- t erage number of such cycles is u1,k = 2 kck. The largest cycles are rings, l = k, and their total number is on av- 1 1 k kt erage uk,k = 2k t e− . The large-k behavior of the cycle length distribution is 0.5 found following the same steps leading to (25) 0 3 1/2 k k(1 t) l2/2k 0 1 2 3 4 5 6 u (t) (8πk )− t e − e− . (40) 10 10 10 10 10 10 10 l,k ' N This distribution is self-similar in the vicinity of the gela- 3 2 tion transition, ul,k(t) (1 t) Φu k(1 t) , l(1 t) , with the scaling function→ − − − FIG. 4: The total number of unicyclic components versus ¡ ¢ the system size at the gelation point. Shown is h versus N. 3 1/2 2 Each data point represents an average over 106 independent Φu(ξ, η) = (8πξ )− exp( η /2ξ). (41) − realizations. We see that the cycle length is characterized by the same 1 scale as the path length, l (1 t)− . At the gelation ∼ − unicyclic components (and hence, cycles) grows logarith- point, the distribution is mically with the system size (Fig. 4) 3 1/2 2 1 ul,k (8πk )− exp( l /2k). (42) h(N) ln N. (46) ' − ' 6 Fixing the component size, the typical cycle length be- Comparing the path length distribution (27) and the haves as the typical path length, l k1/2. ∼ cycle length distribution (42), we conclude that the char- The size distribution of unicyclic components is found acteristic cycle length and the characteristic path length 1/3 from the joint distribution vk = l ul,k. Using (39) we obey the same scaling law, l N . This implies that get [21] the cycle length distribution in∼a finite system of size N, P wl(N), obeys the finite-size scaling law k 1 n 1 1 − k − k kt v (t) = t e− . (43) 1/3 1/3 k 2 n! wl(N) N − Ψw lN − . (47) Ãn=0 ! ∼ X Numerical simulations confirm this³ behavior´ (Fig. 5). This distribution can alternatively be derived from the In the simulations, analysis of cycle statistics requires linear rate equation us to keep track of all links. Cycles are conveniently iden- dv 1 tified using the standard “shaving” algorithm. Dangling k = k2c + (iv )(jc ) k v . (44) links, i.e., links involving a single-link node are removed dt 2 k i j − k i+j=k from the system sequentially. The link removal procedure X is carried until no dangling links remain. At this stage, This equation is obtained from (33) using the equality the system contains no trees. Simple cycles are those 2 k ck = l pl,k. It reflects that linking a pair of nodes in a components with an equal number of links and nodes. component generates a unicyclic component. Integrating The extremal behaviors of the finite-size scaling func- (42) overPthe cycle length, the critical size distribution of tion are as follows unicyclic components has an algebraic tail 1 (2η)− η 0, 1 Ψw(η) 3/2 → (48) v (4k)− . (45) ' exp( C η ) η . k ' ( − → ∞ The small-η behavior follows from (32). Statistics of ex- B. Finite Systems tremely large cycles can be understood by considering the largest possible cycles. When there are n = N/2 links, the largest possible cycle has length l = N/2. Its We turn now to finite systems, restricting our atten- likelihood w(n, 2n) is obtained using combinatorics tion to the gelation point. The total number of unicyclic components is obtained by estimating h(N, tg). Substi- 2n n! n w(n, 2n) = (2n)− . (49) tuting (12) into (38) shows that the average number of n × 2n × µ ¶ 8

1 102

4 N=10 1/3 0.8 N=105 N 6 simulation N=10 101 ) 0.6 η ( w Ψ h η

2 0.4 100 0.2

−1 0 10 0 1 2 3 4 5 6 0 1 2 3 4 10 10 10 10 10 10 10 η N

FIG. 5: Finite-size scaling of the cycle-length distribution. FIG. 7: The average cycle size at the gelation point. Shown is hl(N)ih(N) versus N. Each data point represents an average Shown is 2ηΨw(η) versus η obtained using systems with size 6 N = 104, 105, and 106. The data represents an average over over 10 independent realizations. 106 independent realizations.

1 0 However, the algebraic divergence, wl l− , leads to a 10 logarithmic correction as follows from (46)–(48):∼ N=104 5 n n/3 1 N=10 l (N) N [ln N]− . (51) N=106 h i ∼ 10−1 The behavior of the average cycle length is verified nu- ) merically (Fig. 7). η (

w Finite-size scaling of other cycle statistics such as the Ψ

η joint distribution can be constructed following the same 2 10−2 procedure. For example, the size distribution of unicyclic components should follow the scaling form

2/3 2/3 v (N) N − Ψ kN − . (52) k ∼ v −3 ³ ´ 10 1 The scaling function diverges Ψ (ξ) (4ξ)− for ξ 0. 0 5 10 v ' → η3/2

VI. THE FIRST CYCLE

FIG. 6: The tail of the scaling function. Shown is 2ηΨw(η) 3/2 versus η . The above statistical analysis of cycles characterizes the average behavior but not necessarily the typical one because the number of cycles is a fluctuating quantity. There are 2n ways to choose the nodes participating in n There are numerous interesting features concerning cy- the cycle and the next term is the number of ways to ar- ¡ ¢ cles that are not captured by the average number of cy- range them in a cycle. The corrective factor 2n accounts cles. For instance, what is the probability that the sys- for rotation and reflection symmetries. The last term is tem does not contain a cycle up to time t? It suffices to the probability that each pair of consecutive nodes are answer this question in the pre-gel regime as the giant linked. The large-n asymptotic behavior is component certainly contains cycles. n 1 2 Let s0(t) be the (survival) probability that the system w(n, 2n) . (50) does not contain a cycle at time t. The cycle production ' √2n e dh 1 µ ¶ rate is J = dt = 2(1 t) . The number of cycles is finite in Therefore, w(n, 2n) exp( C N). Substituting l N the pre-gel regime, since− cycles are independent of each into the scaling form∼(47) leads− to the super-exponen∼tial other in the N limit. This assertion (supported by 3/2 → ∞ behavior Ψw(η) exp( Cη ), see Fig. 6. numerical simulations, see Fig. 8) implies that the cycle Typically, cycles∼ are−of size N 1/3. The average mo- production process is completely random. The cycle pro- ments l(N) = l w (N)/ w (N) reflect this law. duction rate characterizes the survival probability s as h i l l l l 0 P P 9

0.4 The probability distribution fl has an algebraic tail,

simulation 3/2 f C l− , (57) Poissonian l ' 0.3 √π with C = 4 for l 1. The tail exponent characterizing the distribution of Àthe first cycle is larger compared with the exponent characterizing all cycles, reflecting the fact n

s 0.2 that the first cycle is created earlier. Similarly, one can obtain additional properties of the first cycle. We mention the probability Fk that the first 0.1 unicyclic component has size k,

1 k 1 2 1 k Fk = dt s0 k ck = Ik (58) 0 0 2 2 k! 0 2 4 6 8 10 Z 1 1/2 k 1 kt n with the integral Ik = 0 dt(1 t) t − e− . This in- tegral can be expressed in terms− of the confluent hy- pergeometric function.R Its asymptotic behavior can FIG. 8: The distribution of the number of cycles. Shown is 5 be readily found by noting that the integrand has a sn versus n at the gelation point. The system size is N = 10 1/2 5 sharp maximum in the region 1 t k− leading and an average over 10 realizations has been performed. A 1/4 3/4 k − ∼ to I 2− Γ(3/4)k− e− . Using this in conjunction Poissonian distribution with an identical average is also shown k ' for reference. with the Stirling’s formula, the size distribution has the algebraic tail

5/4 follows F C k− (59) k ' ds0 = Js . (53) 7/4 1/2 dt − 0 with C = 2− π− Γ(3/4) for k 1. Under the assumption that cycleÀ production is com- The initial condition is s0(0) = 1. As a result, the sur- pletely random, the number of cycles obeys Poisson vival probability is statistics. The probability that there are n cycles, 1/2 s0(t) = (1 t) (54) sn, then satisfies the straightforward generalization of − dsn Eq. (53), viz. = J[sn 1 sn] with the initial condi- for t 1. The survival probability vanishes beyond the dt − − ≤ tion sn(0) = δn,0. The solution is the Poisson distribu- gelation point, s (t) = 0 for t > 1. This reiterates that in n 0 tion s = h e h, see Fig. 8. Explicitly, the distribution the thermodynamic limit, a cycle is certain to form prior n n! − reads to the gelation transition [5]. Since the number of cycles produced is of the order of (1 t)1/2 1 1 n one in the pre-gel regime, one may expect that statistical sn = − ln . (60) n! 2 1 t properties of cycles strongly depend on their generation · − ¸ number or alternatively on their creation time. This is The cumulative distribution Sn(t) = s0(t) + . . . + sn(t) is dwl manifested by the first cycle. The quantity dt s0 dt is plotted in Fig. 9. the probability that (i) the system contains no cycles at The Poisson distribution (60) can also be used to calcu- time t, (ii) a cycles is produced during the time interval late fn,l the size distribution of the nth cycle. We merely (t, t + dt), and (iii) its length is l. Summing these proba- quote the large-l tail behavior bilities gives the probability that the first cycle produced sometimes during the pre-gel regime has length l: n 1 1 3/2 1 − 1 1 fn,l l− ln l (61) dwl 1 1/2 l 1 ∼ (n 1)! 2 f = dt s = dt (1 t) t − . (55) − · ¸ l 0 dt 2 − Z0 Z0 Indeed, summation over the cycle generation reproduces Summing these quantities, we verify the normalization the overall cycle distribution (32). 1 In finite systems, it is possible that no cycle are cre- 1 1/2 fl = dt (1 t)− = 1. ated at the gelation time. This probability decreases al- 2 − l 1 0 gebraically with the system size, as seen by substituting X≥ Z The length distribution of the first cycle can be expressed (12) into (54) 1 in terms of the beta function fl = 2 B(3/2, l) or alterna- 1/6 s N − . (62) tively 0 ∼ √π Γ(l) This prediction agrees with simulations, see Fig. 10. In f = . (56) l 4 Γ(l + 3/2) practice, this slow decay indicates that a relatively large 10

1 100

N−1/6 0.8 simulation

0.6 0 (t) n s S 0.4 n=0 n=1 n=2 10−1 0.2 n=3

0 0 1 2 3 4 5 6 0 0.2 0.4 0.6 0.8 1 10 10 10 10 10 10 10 t N

FIG. 10: The survival probability versus the system size. FIG. 9: The cumulative distribution Sn(t) = 0≤j≤n sj (t) versus t for n = 0, 1, 2, 3. Shown is s0(N) versus N at the gelation point, i.e., when P N/2 links are placed. Each data point represents an average over 106 realizations. system may contain no cycles after N/2 links are placed. Generally, the probability that there is a finite number to the path density via linear rate equations. Both path of cycles increases with the number of cycles and cycle length distributions are coupled to the compo- n nent size distribution. 1 1/6 1 sn N − ln N . (63) Generally, size distributions decay exponentially away ∼ n! 6 · ¸ from the gelation point, but at the gelation time, alge- The length distribution of the first cycle is character- braic tails emerge. As the system approaches this critical ized by the same l N 1/3 size scale as does the overall point, the size distributions follow a self-similar behavior cycle distribution. W∼ e focus on the behavior of the mo- characterized by diverging size scales. ments The kinetic theory approach is well-suited for treat- ing infinite systems. The complementary behavior for fi- n n/3 1/6 l N − . (64) nite systems can be obtained from heuristic scaling argu- h i ∼ ments. This approach yields scaling laws for the typical This behavior is obtained from the distribution (57) that component size, path length, and cycle length at the gela- should be integrated up to the appropriate cutoff, i.e., tion point. These scaling laws can be formalized using 1/3 n N n 3/2 finite-size scaling forms, i.e., self-similarity as a function l 1 dl l l− . As a result, the average size hof thei ∼first cycle is much smaller than the characteristic of the system size, rather than time. Obtaining the exact cycle sizeR l N 1/6. Moments corresponding to the size form of these scaling functions is a nice challenge in par- of the firsth unicyclici ∼ component grow as follows ticular for the most fundamental quantity, the component size distribution that is characterized by a non-monotonic n 2n/3 1/6 k N − , (65) scaling function. h i ∼ The kinetic theory approach seems artificial at first as obtained from (59). Consequently, the average size of sight. Indeed, graphs are discrete in nature and there- the first unicyclic component is smaller than the charac- fore combinatorial approaches appear more natural. Yet, teristic component size, k N 1/2. once the rate equations are formulated, the analysis is h i ∼ straightforward. Utilizing the continuous time variable allows us to employ powerful analysis tools. Moreover, VII. CONCLUSIONS some of the kinetic theory results are less cumbersome compared with the combinatorial results. In summary, we have extended the kinetic theory de- The same methodology can be expanded to analyze scription of random graphs to structures such as paths other features of random graphs. For example, correla- and cycles. Modeling the linking process dynamically tions between the node degree and the cluster size can leads to an aggregation process for both components and be analyzed using bi-aggregation rate equations [36]. It paths. The density of paths in finite components is cou- is quite possible that structural properties in other ag- pled to the component size distribution via nonlinear rate gregation processes, for example, polymerization with a equations while the average number of cycles is coupled sum kernel [17], and in other variants of random graphs 11

r G z such as small-world networks [37] can be analyzed using tions A(z) = G (z) with G(z) satisfying Ge− = e , the kinetic theory. coefficients Ak can be obtained via contour integration One could try to utilize kinetic theory to probe the dis- in the complex y plane where y = ez as follows tribution of various families of subgraphs. We have lim- ited ourselves to cycles since they, alongside with trees, do appear in random graphs while more interconnected 1 Gr A = dy families of subgraphs are very rear [29]. Yet in biological k 2πi yk+1 I and technological networks certain interconnected fam- 1 e(k+1)G dy ilies of subgraphs do appear. Such populated families = dG Gr 2πi Gk+1 dG of subgraphs, motifs, are believed to carry information I (k+r)G processing functions [38, 39]. It will be interesting to 1 r e G = dG G (1 G)e− use kinetic theory to analyze motifs in special random 2πi Gk+1 − I n graphs. 1 k n+r k n+r+1 k = dG (G − G − ) 2πi n! − This research was supported by the DOE (W-7405- I n k r 1 X ENG-36). k − − = r . (A1) (k r)! − APPENDIX A: CONTOUR INTEGRATION G z Since Ge− = e , it is convenient to perform the integra- kz Let A(z) = k Ake be the generating function of tion in the complex G plane. In writing the third line, dy G the coefficients Ak. For the family of generating func- we used = (1 G)e− . P dG −

[1] R. Solomonoff and A. Rapaport, Bull. Math. Biophys. [21] E. Ben-Naim and P. L. Krapivsky, J. Phys. A 37, L189 13, 107 (1959). (2004). [2] P. Erd˝os and A. R´enyi, Publ. Math. Inst. Hungar. Acad. [22] A. A. Lushnikov, J. Colloid. Inter. Sci. 65, 276 (1977). Sci. 5, 17 (1960). [23] P. G. J. van Dongen and M. H. Ernst, J. Stat. Phys. 49, [3] B. Bollob´as, Random Graphs (Academic Press, London, 879 (1987). 1985). [24] D. H. Zanette and S. C. Manrubia, Physica A 295, 1 [4] S. Janson, T. LÃ uczak, and A. Rucinski, Random Graphs (2001). (John Wiley & Sons, New York, 2000). [25] S. N. Dorogovtsev J. F. F. Mendes, and A. N. Samukhin, [5] S. Janson, D. E. Knuth, T. Luczak,Ã and B. Pittel, Rand. Phys. Rev. E 63, 062101 (2001). Struct. Alg. 3, 233 (1993). [26] Z. Burda, J. D. Correia, and A. Krzywicki, Phys. Rev. E [6] P. J. Flory, J. Amer. Chem. Soc. 63, 3083 (1941). 64, 046118 (2001). [7] W. H. Stockmayer, J. Chem. Phys. 11, 45 (1943). [27] P. L. Krapivsky and S. Redner, J. Phys. A 35, 9517 [8] P. J. Flory, Principles of Polymer Chemistry (Cornell (2003). University Press, Ithaca, 1953). [28] P. L. Krapivsky and E. Ben-Naim, Phys. Rev. E 53, 291 [9] D. Stauffer, Introduction to (Taylor (1996). & Francis, London, 1985). [29] S. Itzkovitz, R. Milo, N. Kashtan, G. Ziv, and U. Alon, [10] T. Kalisky, R. Cohen, D. ben-Avraham, and S. Havlin, Phys. Rev. E 68, 026127 (2003). Lect. Notes. Phys. 650, 3 (2004). [30] H. D. Rozenfeld, J. E. Kirk, E. M. Bollt, and D. ben- [11] B. Bollob´as, C. Borgs, J. T. Chayes, J. H. Kim, and Avraham, cond-mat/0403536. D. B. Wilson, Rand. Struct. Alg. 18, 201 (2001). [31] E. Marinari and R. Monasson, cond-mat/0407253. [12] M. E. J. Newman, S. Strogatz, and D. J. Watts, Phys. [32] S. Janson, Rand. Struc. Alg. 17, 343 (2000). Rev. E 64, 026118 (2001). [33] S. Janson, Combin. Probab. Comput. 12, 27 (2003). [13] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. [34] Z. Burda, J. Jurkiewicz, and A. Krzywicki, Phys. Rev. E 99, 7821 (2002). 69, 026106 (2004); ibid 70, 026106 (2004). [14] M. V. Smoluchowski, Physik. Zeits. 17, 557 (1916). Zeits. [35] G. Bianconi and A. Capocci, Phys. Rev. Lett. 90, 078701 Phys. Chem. 92, 129 (1917). (2003); G. Bianconi, G. Caldarelli, and A. Capocci, cond- [15] S. Chandrasekhar, Rev. Mod. Phys. 15, 1–89 (1943). mat/0408349. [16] D. J. Aldous, Bernoulli 5, 3 (1999). [36] E. Ben-Naim and P. L. Krapivsky, unpublished. [17] F. Leyvraz, Phys. Rep. 383, 95 (2003). [37] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998). [18] J. B. McLeod, Quart. J. Math. Oxford 13, 119 (1962); [38] R. Milo et al, Science 298, 824–827 (2002); ibid 303, ibid 13, 193 (1962); ibid 13, 283 (1962). 1538–1542 (2004). [19] E. M. Hendriks, M. H. Ernst, and R. M. Ziff, J. Stat. [39] V. Spirin and L. A. Mirny, Proc. Natl. Acad. Sci. 100, Phys. 31, 519 (1983). 12123–12128 (2003). [20] E. Ben-Naim and P. L. Krapivsky, Europhys. Lett. 65, 151 (2004).