Growth of Common Friends in a Preferential Attachment Model

GROWTH OF COMMON FRIENDS IN A PREFERENTIAL ATTACHMENT MODEL

By Bikramjit Das and Souvik Ghosh Singapore University of Technology and Design† and LinkedIn‡

The number of common friends (or connections) in a graph is a commonly used measure of proximity between two nodes. Such measures are used in link prediction algorithms and recommendation systems in large online social networks. We obtain the rate of growth of the number of common friends in a linear preferential attachment model. We apply our result to develop an estimate for the number of common friends. We also observe a phase transition in the limiting behavior of the number of common friends; depending on the range of the parameters of the model, the growth is either power-law, or, logarithmic, or static with the size of the graph.

1. Introduction. Networks platforms like LinkedIn, Facebook, Instagram and Twitter form a big part of our culture. These networks have facilitated an increasing number of personal as well as professional interactions. The networking platforms strive to grow the network (graph) both in terms of the number of users (nodes) and the number of friendships or connections (edges) since a more densely connected user network typically results in a more engaged user base. The platforms often use recommendation systems like People You May Know (LinkedIn, Facebook) or Who to Follow (Twitter) (Gupta et al., 2013), that recommend individual users to connect with other users on the platform. Such recommendation systems look for signals that indicate that two individuals might know each other. For example, having a common friend between two users is a signal that they know each other. Furthermore, if two users have many friends in common then there is a high chance that they know each other. A generalization of this problem is that of link prediction in a network and is well-studied in the literature (Liben-Nowell and Kleinberg, 2007). In this paper we establish the rate of growth of common friends for a fixed pair of nodes in a linear preferential attachment model, a commonly used generative graph model. The preferential attachment model, made popular in Barabásiand Albert(1999), is a very well studied class of graph models. Studies have covered the behavior of degree sequence (Bollobáset al., 2001, Resnick and arXiv:1908.04510v1 [math.PR] 13 Aug 2019 Samorodnitsky, 2015, Samorodnitsky et al., 2016), the maximal degrees in a graph, second-order degree sequences (size of network of friends of friends) (van der Hofstad, 2017, Section 8), generalizations to sublinear preferential attachment (Dereich and Mörters, 2009) and limiting structure of networks (Elwes, 2016). Other generalizations and extensions of these scale-free models have been studied in Bollobáset al.(2003), Cooper and Frieze(2003). See van der Hofstad(2017) for a nice overview and proper definitions of the models. To the best of our knowledge this is the first theoretical study of the number of common friends in a graph. Two important observations follow from our result:

∗The authors gratefully acknowledge support from MOE Tier 2 grant MOE2017-T2-2-161. AMS 2000 subject classiﬁcations: Primary 60F15, 60G42; secondary 90B15, 91D30 Keywords and phrases: heavy-tail, limit theorem, link prediction, preferential attachment, social network 1 2 B. DAS AND S. GHOSH • There is a phase transition in the asymptotic behavior of common friends. Depending on parameter values of the preferential attachment model, the number of common friends can exhibit a power-law or logarithmic growth or be static with the growth of the graph. • A corollary of our result is that we can use sampling techniques to estimate the common friends in a large network. This is helpful because computing the number of common friends for every pair of nodes in a graph is computationally expensive, especially for large networks with hundreds of millions of nodes and hundreds of billions of edges. This paper is organized as follows. In Section2 we describe the linear preferential attachment model we work with and state the main result. In Section3 we show some simulated results providing intuition for our results. We provide the proof of the main result and some required supplementary results in Section4. We conclude indicating future direction of work in Section5.

2. Growth of Common Friends: Main Result. The model paradigm we work with is a version of the well-known undirected linear preferential attachment graph. The idea is that at every time instance when a new node comes to the network it creates C independent edges and attaches to the previous nodes following a preferential attachment rule. The process is described as follows: δ,C ∗ At any time n ≥ 1, the graph sequence is denoted PAn where C ∈ N = {1, 2,...} and δ > −C. δ,C δ,C δ,C Initially, the graph PA1 has one node v1 with C self-loops. Then PAn evolves to PAn+1 thus: at th the (n + 1) stage, a new node named vn+1 is added along with C edges each of which has vn+1 as one of its vertices, and the other vertex is selected from Vn := {v1, v2, . . . , vn} with probability δ,C proportional to the degree of the vertex (shifted by a parameter δ) in PAn . For 1 ≤ i ≤ n:

D (n) + δ D (n) + δ p := [v ↔ v |PAδ,C ] = i = i . (2.1) i,n+1 P i n+1 n P (D (n) + δ) (2C + δ)n j∈Vn j

δ,C Here Di(n) := degree of vi in PAn . The evolution of Di(n) occurs as:

Di(n + 1) = Di(n) + ∆i(n + 1) (2.2) where ∆i(n + 1) := the number of stubs of vn+1 (out of C) which attaches to vi. Moreover at any stage n, for 1 ≤ i < j ≤ n, call

Nij(n) := # friends common to i and j.

δ,C We ignore multi-edges when counting Nij(n) in the graph PAn , that is, vk counts as one common δ,C friend (vertex) between vi and vj in PAn for 1 ≤ i 6= j 6= k ≤ n, if vi ↔ vk and vk ↔ vj regardless of their multiplicity. Our goal is to understand the behavior of Nij(n) for 1 ≤ i < j ≤ n as n becomes large. Observe that in our model the growth of Nij(n) occurs via the recurrence relation

Nij(n + 1) = Nij(n) + 1Bij (n+1) (2.3) where Bij(n + 1) is the event {vi ↔ vn+1 ↔ vj}. Also, note that the possible range of parameters is C ≥ 2 and δ > −C. The power-law growth behavior for the degree distribution of a speciﬁc node in a linear preferential attachment model is well-known Bollob´aset al.(2001),(van der Hofstad, 2017, Section 8). COMMON FRIENDS 3

Proposition 2.1. For any ﬁxed node vi, we have

Di(n) lim → Di(∞) a.s., n→∞ nγ C where γ = 2C+δ , and Di(∞) is a non-negative random variable with Γ(i) E[Di(∞)] = (C + δ) Γ(i+γ) . Proposition 2.1 can be derived using arguments from (van der Hofstad, 2017, Proposition 8.2) or using similar arguments as in the proof of the Proposition 4.4 provided in Section4. Our main contribution is the following theorem which provides the growth rate of number of common friends of two nodes in such a model. The proof is given in Section4.

δ,C Theorem 2.2. Under the linear preferential attachment model, (PAn )n≥1, with C ≥ 2, for any two ﬁxed nodes vi, vj, we have

(1) lim Nij(n) = Nij(∞) a.s. if δ > 0, n→∞ Nij(n) C(C − 1) (2) lim = Di(∞)Dj(∞) a.s. if δ = 0, n→∞ log n (2C + δ)2

Nij(n) C(C − 1) (3) lim = Di(∞)Dj(∞) a.s. if δ < 0. n→∞ n2γ−1/(2γ − 1) (2C + δ)2 where γ = C , γ = (1 − √1 )γ, γ = (1 + √1 )γ. Furthermore, [N (∞)] < ∞ and Y (∞) = 2C+δ 1 C 2 C E ij ij D (∞)D (∞) with [Y (∞)] = (C + δ)2 Γ(i)Γ(j)Γ(j+γ) ; D (∞) is the limit of the scaled degree i j E ij Γ(i+γ)Γ(j+γ1)Γ(j+γ2) i sequence of node vi as deﬁned in Proposition 2.1.

Remark 2.3. An interesting observation is the different regimes in the growth rate of number of common friends depending on the parameter δ. 1. When δ > 0, we are in a regime that is mildly preferential attachment. In this regime, the nodes with low degree also get enough number of new friends. As δ increases, more nodes have a similar chance of being selected. Although the individual degrees for a fixed node grows like a power-law behavior, the number of common friends between two fixed nodes has a finite expectation even in the limit.

2. For δ ≤ 0 and especially δ closer to −C, the new nodes prefer to friend nodes with a high degree. In this case the number of common friends tend to grow with the number of nodes, as a power-law for δ < 0 and at a logarithmic rate for δ = 0.

δ,C Corollary 2.4. Under the preferential attachment model, (PAn )n≥1, with C ≥ 2, for any two ﬁxed nodes vi, vj and k > 1, we have

Nij(n) (1) lim = 1 a.s. if δ > 0 and lim Nij(n) > 0, n→∞ Nij(bn/kc) n→∞ Nij(n) (2) lim = 1 a.s. if δ = 0 and Yij(∞) > 0, n→∞ Nij(bn/kc) Nij(n) (3) lim 2γ−1 = 1 a.s. if δ < 0 and Yij(∞) > 0. n→∞ Nij(bn/kc)k 4 B. DAS AND S. GHOSH

Proof. The result is an easy application of the almost sure convergences of Nij(n) observed in Theorem 2.2.

The above corollary states that we can consistently estimate the number of common friends for a given pair of nodes using an earlier state of the graph, i.e., for any k > 1

ˆ k Nij(bn/kc) if δ ≥ 0 Nij(n) = 2γ−1 Nij(bn/kc)k if δ < 0.

δ,C δ,C For a large value of k, the graph PAn/k can be signiﬁcantly smaller than PAn and it is signiﬁcantly cheaper to estimate the number of common friends.

3. Simulation Study. We illustrate the key idea behind Theorem 2.2 in the following simulated examples. Figure1 shows instances of graphs simulated from the preferential attachment model with 20 nodes and C = 2; left one with δ = −1.5 and the right one with δ = 1.5. When δ = −1.5 we observe that the graph grows quite preferentially. New nodes tend to connect with the same few nodes and hence the number of common friends for them keep growing fast. When δ = 1.5, the graph is more distributed. New nodes tend to connect with diﬀerent nodes and hence the number of common friends does not grow so much.

δ,2 Fig 1. Graphs with 20 nodes simulated from PA20 . On the left we have δ = −1.5, and the right one has δ = 1.5.

25 0.06

0.05 20

0.04 15

0.03 10

0.02 5 0.01

0 0.00 0 20 40 60 80 100 0 100 200 300 400 500

−1.5,2 Fig 2. Simulation results for the number of common friends for two ﬁxed nodes for PA500 (δ = −1.5,C = 2, n = 500). The left plot is the histogram obtained after 2500 simulations. The right plot is the growth path for 5 arbitrarily chosen simulations. COMMON FRIENDS 5

0.25 6

5 0.20

4 0.15 3

0.10 2

0.05 1

0 0.00 2 4 6 8 10 12 0 100 200 300 400 500

1.5,2 Fig 3. Simulation results for the number of common friends for two ﬁxed nodes for PA500 (δ = 1.5,C = 2, n = 500). The left plot is the histogram obtained after 2500 simulations. The right plot is the growth path for 5 arbitrarily chosen simulations.

To understand asymptotic property of the behavior of common friends, we simulate larger graphs and replicate the exercise multiple times. The left plot in Figure2 is the histogram of number −1.5,2 of common friends for two ﬁxed nodes (v10 and v20) for PA500 replicated 2500 times shows the heavy-tailed phenomenon. We also show trajectories of the number common friends for 5 arbitrary simulations as the size of the network grows in the right plot of Figure2. Figure3 shows the behavior of common friends when δ = 1.5. As expected, we see that common friends do not grow that fast in this case.

k=2, n=1000 k=2, n=2000 k=2, n=4000 4.0 4.0 4.0

3.5 3.5 3.5

3.0 3.0 3.0

2.5 2.5 2.5

2.0 2.0 2.0

1.5 1.5 1.5

1.0 1.0 1.0

0.5 0.5 0.5

0.0 0.0 0.0 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50

k=4, n=1000 k=4, n=2000 k=4, n=4000 4.0 4.0 4.0

3.5 3.5 3.5

3.0 3.0 3.0

2.5 2.5 2.5

2.0 2.0 2.0

1.5 1.5 1.5

1.0 1.0 1.0

0.5 0.5 0.5

0.0 0.0 0.0 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50

1.5,2 Fig 4. Simulation results for estimating the number of common friends for a PAn model. The ﬁrst row shows the ˆ k histogram of Nij (n)/Nij (n) for k = 2 and n = 1000, 2000, 4000. The second row shows the same for k = 4.

We also check the validity of Corollary 2.4 using simulations. Figure4 provides simulation results ˆ k 1.5,2 for the estimator Nij(n) for a PAn model. The ﬁrst row shows the histogram of 500 simulations ˆ k of Nij(n)/Nij(n) for k = 2 and n = 1000, 2000, 4000. The second row shows the same for k = 4. ˆ k We see the concentration of Nij(n)/Nij(n) near 1 as n increases.

4. Proof of Main Result. In this section, we prove Theorem 2.2 by observing the asymptotic behavior of Di(n), (Di(n),Dj(n)) jointly and functions thereof. Recall that our preferential attach- 6 B. DAS AND S. GHOSH δ,C ment graph sequence is (PAn )n≥1. We assume that C ≥ 2 and δ > −C for all the results in this section. For convenience’s sake we use the following notations:

Xi(n) := Di(n) + δ,

Yij(n) := (Di(n) + δ)(Dj(n) + δ) = Xi(n)Xj(n).

From Proposition 2.1 we get

Xi(n) lim → Di(∞) a.s., (4.1) n→∞ nγ

C where γ = 2C+δ . In the next few steps we apply Martingale Convergence Theorem to show almost sure convergence for the appropriately scaled sequences of random variables {Yij(n)}. We also prove uniform inte- grability of the sequences {Xi(n)} and {Yij(n)} so that we additionally have convergence in L1 and can compute the expectation of the limit.

Lemma 4.1. For any k ≥ 1 and for a ﬁxed i, we have

k E[Xi(n) ] sup kγ < ∞. n≥i n

Proof. We prove the result by induction on k. We have Xi(n + 1) = Xi(n) + ∆i(n + 1) and δ,C ∆i(n + 1)|PAn ∼ Binomial(C, pi,n+1). Hence for k = 1 with 1 ≤ i ≤ n we get

δ,C δ,C E[Xi(n + 1)|PAn ] = Xi(n) + E[∆i(n + 1)|PAn ] = Xi(n) + Cpi,n+1 D (n) + δ γ = (D (n) + δ) + C · i = X (n) 1 + . i (2C + δ)n i n

We also have E[Xi(i)] = Di(i) + δ = C + δ. Now n + γ [X (n + 1)] = [X (n)] E i E i n i + γ i + 1 + γ n + γ = [X (i)] ··· E i i i + 1 n Γ(i) Γ(n + 1 + γ) Γ(i) 1 = (C + δ) = (C + δ) (n + 1)γ 1 + O Γ(i + γ) Γ(n + 1) Γ(i + γ) n using Stirling’s formula (Abramowitz and Stegun, 2012) given by

Γ(n + a) 1 = na 1 + O . (4.2) Γ(n) n

Hence for any n ≥ i,

[X (n)] Γ(i) 1 E i = (C + δ) 1 + O < ∞. nγ Γ(i + γ) n COMMON FRIENDS 7 Thus the result holds for k = 1. By induction hypothesis, let the result be true for j = 1, . . . , k − 1 and we have constants Cj such that

j E[Xi(n) ] sup jγ ≤ Cj < ∞. (4.3) n≥i n

Xi(n) Denoting p = pi,n+1 = (2C+δ)n we get

h ki E Xi(n + 1) h h ii k δ,C = E E [Xi(n) + ∆i(n + 1)] PAn k = X (n)k + kX (n)k−1∆ (n + 1) + X (n)k−2∆ (n + 1)2 + ··· PAδ,C E E i i i 2 i i n k = X (n)k + kX (n)k−1Cp + X (n)k−2(Cp(1 − p) + (Cp)2) + ··· E i i 2 i kγ k(k − 1)(C − 1) γ 2 k(k − 1) γ = X (n)k + X (n)k + X (n)k + X (n)k−1 + ··· E i i n i 2C n i 2 n k−1 h i kγ 1 X γ 1 = X (n)k 1 + + O + [X (n)k−r] α + O E i n n2 E i k−r n n2 r=1 k−1 h i kγ 1 X γ 1 ≤ X (n)k 1 + + O + C n(k−r)γ α + O (using (4.3)) E i n n2 k−r k−r n n2 r=1 h i kγ 1 ≤ X (n)k 1 + + O + C∗n(k−1)γ−1 E i n n2 h ki =: anE Xi(n) + bn (4.4)

0 ∗ where αis and C (appropriately chosen) are constants, and we denote kγ 1 ∗ (k−1)γ−1 an = 1 + n + O n2 , bn = C n . Now using (4.4) recursively we get

n n n ! h ki h ki Y X Y E Xi(n + 1) ≤ E Xi(i) ar + b` a` r=i `=i r=`+1 Using Sterling’s formula we have for any ` < n,

n Y nkγ 1 ar = 1 + O (n → ∞), a1 . . . a` n r=`+1 nkγ 1 = 1 + O (n > ` → ∞). `kγ `

Therefore,

k n kγ h ki (C + δ) kγ 1 X ∗ (k−1)γ+1 n 1 E Xi(n + 1) ≤ n 1 + O + C ` kγ 1 + O a1 . . . ai−1 n ` ` `=i 8 B. DAS AND S. GHOSH n 1 X 1 1 ≤ A nkγ 1 + O + C∗nkγ 1 + O 1 n `1+γ ` `=i

k kγ where A1 = (C + δ) /(a1 . . . ai−1). Hence dividing both sides by (n + 1) we get

k n E Xi(n + 1) 1 X 1 1 ≤ A 1 + O + C∗ 1 + O < ∞, for any n ≥ i. (n + 1)γk 1 n `1+γ ` `=i

Hence the result holds for j = k.

Lemma 4.2. For any k ≥ 1, we have for a ﬁxed i < j,

2k E[Yij(n) ] sup 2kγ < ∞. n≥j n

Proof. This follows from Lemma 4.1 and the Cauchy-Schwarz inequality.

γ γ k Remark 4.3. Since both sequences {Xi(n)/n }n≥i and {Yij(n)/n }n≥j are L bounded for some k > 1 by Lemmas 4.1 and 4.2, they are also uniformly integrable; see (Durrett, 2019, Theorem 4.6.2).

The next Proposition 4.4 describes the asymptotic behavior of product of the degrees of two nodes, which as expected also has a power-law growth.

Proposition 4.4. For any i < j we have

Yij(n) lim → Yij(∞) = Di(∞)Dj(∞) a.s., n→∞ n2γ

2 Γ(i)Γ(j)Γ(j+γ) C 1 where [Yij(∞)] = (C + δ) =: Cij with γ = , γ1 = (1 − √ )γ, γ2 = E Γ(i+γ)Γ(j+γ1)Γ(j+γ2) 2C+δ C (1 + √1 )γ. Here D (∞),D (∞) are as deﬁned in Proposition 2.1. C i j

Proof. Note that Y (n) p p = ij i,n+1 j,n+1 (2C + δ)2n2 and writing pi = pi,n+1, pj = pj,n+1 we have

δ,C E[Yij(n + 1)|PAn ] δ,C = Yij(n) + E[Yij(n + 1) − Yij(n)|PAn ] k l C−k−l X h iC! pi pj(1 − pi − pj) = Y (n) + k(D (n) + δ) + l(D (n) + δ) + kl ij j i k!l!(C − k − l!) 0≤k,l≤C k+l=C (n + γ )(n + γ ) = Y (n) 1 2 , ij n2 COMMON FRIENDS 9 where γ = C , γ = (1 − √1 )γ, γ = (1 + √1 )γ. Moreover, for i < j we have, 2C+δ 1 C 2 C Γ(i)Γ(j + γ) Y (j) = [(D (j) + δ)(D (j) + δ)] = (C + δ)2 . E ij E i j Γ(j)Γ(i + γ) Therefore, (n + γ )(n + γ ) Y (n + 1) = Y (n) 1 2 E ij E ij n2 (j + γ )(j + γ ) (n + γ )(n + γ ) = Y (j) 1 2 ... 1 2 E ij j2 n2 2 2 Γ(i)Γ(j + γ) (Γ(j)) Γ(n + 1 + γ1)Γ(n + 1 + γ2) = (C + δ) 2 Γ(j)Γ(i + γ) (Γ(n + 1)) Γ(j + γ1)Γ(j + γ2) Γ(n + 1 + γ )Γ(n + 1 + γ ) =: C 1 2 . (4.5) ij (Γ(n + 1))2 Deﬁne 2 Yij(n) (Γ(n)) Yij(n) 1 Wij(n) := = 2γ 1 + O , Cij Γ(n + γ1)Γ(n + γ2) Cijn n using Sterling’s formula. Hence by Lemma 4.2, {Wij(n)}j≥n is uniformly integrable. Moreover Wij(n) ≥ 0, EWij(n) = 1, and E[Wij(n + 1)|Wij(n)] = Wij(n) for 1 ≤ i 6= j ≤ n. Hence by Doob’s Martinagale Convergence Theorem (Durrett, 2019, Theorem 4.2.11 and Theorem 4.6.4)

1 lim Wij(n) → Wij(∞), a.s. and in L , n→∞ where Wij(∞) := lim supn Wij(n) and EWij(∞) < ∞. Hence we have Yij(n) 1 lim = Cij lim Wij(n) 1 + O → CijWij(∞) =: Yij(∞) n→∞ n2γ n→∞ n

1 both almost surely and in L with E(Yij(∞)) = Cij. From Proposition 2.1 and (4.1) we can check 2γ that n → ∞, Yij(n)/n → Di(∞)Dj(∞) a.s.; hence Yij(∞) = Di(∞)Dj(∞) a.s.

Lemma 4.5. For any i < j, we have n 1 X Yij(k) (1) lim = Yij(∞) a.s if δ = 0, n→∞ log(n) k2 k=j n 1 X Yij(k) (2) lim = Yij(∞) a.s if δ < 0, n→∞ n2γ−1/(2γ − 1) k2 k=j

where Yij(∞) is as deﬁned in Proposition 4.4.

Proof. Let all our random variables be deﬁned on the probability space (Ω, A, P). From Proposi- tion 4.4, for ω ∈ Ω,

Yij(n, ω) lim = Yij(∞, ω) (4.6) n→∞ n2γ 10 B. DAS AND S. GHOSH ∗ holds with probability 1. Fix such an ω ∈ Ω. Then given any small > 0, there exists n0 ∈ N such that for any n ≥ n0,

Y (n, ω) ij − Y (∞, ω) < . (4.7) 2γ ij n n n X 1 X 1 (1) If δ = 0, we have γ = 1 . Also for n → ∞, we have = ∼ log(n). Hence 2 k2−2γ k k=1 k=1 n 1 X Yij(k, ω) − Y (∞, ω) log(n) k2 ij k=j n0 n 1 X Yij(k) 1 X Yij(k, ω) Yij(∞, ω) ≤ + − log(n) k2 log(n) k2 k k=j k=n0+1  n  Yij(∞, ω) X 1 + − log(n) log(n)  k  k=n0+1

=: In + IIn + IIIn. 1 Check that as n → ∞,In → 0, IIIn → 0. Since γ = 2 using (4.7) we get n n 1 X 1 Yij(k, ω) 1 X 1 II = − Y (∞, ω) ≤ → (as n → ∞). n log(n) k k2γ ij log(n) k k=n0+1 k=n0+1 Therefore we have, n 1 X Yij(k, ω) → Y (∞, ω). (4.8) log(n) k2 ij k=j By Proposition 4.4,(4.6) holds almost surely implying (4.8) holds almost surely and hence Lemma 4.5(1) holds. 1 C (2) For δ < 0, we get 2 < γ = 2C+δ < 1 which means 0 < 2 − 2γ < 1. Note that for any 0 < α < 1, we have n X 1 1 ∼ n1−α (n → ∞). kα 1 − α k=1 Now we can prove Lemma 4.5(2) in the same manner as we proved (1).

Lemma 4.6. For any i < j, we have n 1 X Yij(k) C − 2 Xi(k) + Xj(k) (1) lim 1 − = Yij(∞) a.s. if δ = 0, n→∞ log(n) k2 2 (2C + δ)k k=j n 1 X Yij(k) C − 2 Xi(k) + Xj(k) (2) lim 1 − = Yij(∞) a.s. if δ < 0 n→∞ n2γ−1/(2γ − 1) k2 2 (2C + δ)k k=j where Yij(∞) is as deﬁned in Proposition 4.4. COMMON FRIENDS 11 Proof. Deﬁne C − 2 X (n) + X (n) Y ∗(n) := Y (n) 1 − i j . ij ij 2 (2C + δ)n Note that for 0 < γ < 1,

Y ∗(n) Y (n) C − 2 X (n) + X (n) ij = ij 1 − i j . n2γ n2γ 2 (2C + δ)n Y (n) C − 2 X (n) + X (n) 1 = ij 1 − i j n2γ 2 nγ (2C + δ)n1−γ C − 2 → Y (∞) 1 − (D (∞) + D (∞)) × 0 = Y (∞) (as n → ∞) ij 2 i j ij which holds a.s. using Propositions 2.1 and 4.4. Now we can proceed to prove the statements using ∗ the same arguments as in Lemma 4.5 by replacing Yij with Yij.

With the aid of all the results above we are in a position to prove Theorem 2.2.

Proof of Theorem 2.2. Using (2.3) recursively we have

n+1 X Nij(n + 1) = Nij(j) + 1Bij (n+1). (4.9) k=j+1 For any 1 ≤ i < j ≤ n with C ≥ 2,

δ,C δ,C E[Nij(n + 1)|PAn ] = Nij(n) + E[Nij(n + 1) − Nij(n)|PAn ] " D (n) + δ C = N (n) + 1 − 1 − i ij (2C + δ)n # D (n) + δ C D (n) + D (n) + 2δ C − 1 − j + 1 − i j . (2C + δ)n (2C + δ)n

We can check that, Y (n) [N (n + 1)|PAδ,C ] ≤ N (n) + C(C − 1) ij , (4.10) E ij n ij (2C + δ)2n2 holding with equality for C = 2 and

δ,C E[Nij(n + 1)|PAn ] Y (n) C − 2 X (n) + X (n) ≥ N (n) + C(C − 1) ij 1 − i j . (4.11) ij (2C + δ)2n2 2 (2C + δ)n

Proof of part (1). First we prove the case when δ > 0. Clearly as n → ∞, Nij(n + 1) ↑ Nij(∞) where ∞ X Nij(∞) := Nij(j) + 1Bij (n+1). k=j+1 12 B. DAS AND S. GHOSH

We want to show that Nij(∞) < ∞ a.s.. Taking expectations in (4.10) and using (4.5) we get

C(C − 1) [N (n + 1)] ≤ [N (n)] + [Y (n)] E ij E ij E ij (2C + δ)2n2 C(C − 1)C Γ(n + γ )Γ(n + γ ) = [N (n)] + ij 1 2 . E ij (2C + δ)2 (Γ(n + 1))2 Applying this argument recursively we get

n C(C − 1)Cij X Γ(k + γ1)Γ(k + γ2) [N (n + 1)] ≤ [N (j)] + E ij E ij (2C + δ)2 (Γ(k + 1))2 k=j ∞ X Γ(k + γ1)Γ(k + γ2) < [N (j)] + C˜ E ij (Γ(k + 1))2 k=j ∞ ˜ X 1 1 ≤ E[Nij(j)] + C 1 + O . k2−(γ1+γ2) k k=j

˜ C(C−1)Cij where C = (2C+δ)2 . Therefore for any n we have

n+1 ∞ X X 1 1 (B (k)) = [N (n + 1)] − [N (j)] < C˜ 1 + O < ∞ (4.12) P ij E ij E ij k2−2γ k k=j+1 k=j

δ since 2 − (γ1 + γ2) = 2 − 2γ = 1 + 2C+δ > 1 for δ > 0. Since the right hand side in (4.12) does not P∞ depend on n, k=j+1 P(Bij(k)) < ∞. Using Borel-Cantelli Lemma (Durrett, 2019, Theorem 2.3.1) this implies P(Bij happens inﬁnitely often) = 0 P∞ and hence k=j+1 1Bij (n+1) < ∞ a.s. and since Nij(j) ≤ max(j − 2,C) we have

∞ X Nij(∞) = Nij(j) + 1Bij (k) < ∞ a.s. k=j+1

Proof of part (2). Here we address the case where δ = 0. Deﬁne

δ,C h δ,C i δ,C Qn = P(Bij(n)|PAn ) = E 1Bij (n) PAn = E[Nij(n) − Nij(n)|PAn ].

Using the conditional Borel-Cantelli Lemma (Durrett, 2019, Theorem 4.4.5) we have

Pn n k=j+1 1Bij (k) n X o Pn → 1 a.s. on Qk = ∞ (4.13) Qk k=j+1 k=j+1

Note that, using (4.10) and (4.11), we have for i < j < n,

Y (n) C − 2 X (n) + X (n) L := C(C − 1) ij 1 − i j ≤ Q , and, n (2C + δ)2n2 2 (2C + δ)n n COMMON FRIENDS 13 Y (n) R := C(C − 1) ij ≥ Q . n (2C + δ)2n2 n

Using the above recursively we obtain

n n n X X X Lk ≤ Qk ≤ Rk. k=j+1 k=j+1 k=j+1

Now, from Lemmas 4.5(1) and 4.6(1) we have

Pn Pn k=j+1 Lk k=j+1 Rk C(C − 1) lim = lim = Yij(∞) a.s. n→∞ log(n) n→∞ log(n) (2C + δ)2

Therefore we get

Pn n k=j+1 Qk C(C − 1) n X o lim = Yij(∞) and Qk = ∞ a.s. (4.14) n→∞ log(n) (2C + δ)2 k=j+1

Hence from (4.13) and (4.14) we have

Pn N (n) N (j) 1B (k) lim ij = lim ij + lim k=j+1 ij n→∞ log(n) n→∞ log(n) n→∞ log(n) Pn Pn k=j+1 1Bij (k) k=j+1 Qk = 0 + lim n n→∞ P k=j+1 Qk log(n) C(C − 1) C(C − 1) = Y (∞) = D (∞)D (∞) a.s. (2C + δ)2 ij (2C + δ)2 i j

Proof of part (3). The case where δ < 0 can be shown using the same technique as for δ = 0 by using Lemmas 4.5(2) and 4.6(2) in place of Lemmas 4.5(1) and 4.6(1).

5. Conclusion. In this paper we establish the rate of growth of the number of common friends for two ﬁxed nodes in a linear preferential attachment model. The growth rate is shown to be static, logarithmic or power-law type depending on the choice of the parameter- δ > 0, δ = 0 or δ < 0 respectively. We use this result to prove consistency of an estimator of the number of common friends that is less expensive to compute. Such results will be applicable in both link prediction problems for large dynamic networks as well as detection methods for a preferential attachment model. This is the ﬁrst step in showing a more general result regarding the growth behavior for common friends of any randomly chosen pair of nodes and obtaining uniform convergence bounds for estima- tors of common friends. Further properties of such models and estimation issues are under current investigation.

6. Acknowledgement. The authors are very grateful to the referee for insightful comments and also for providing us with precise ideas to ﬁll gaps in parts of the proof of Theorem 2.1. 14 B. DAS AND S. GHOSH REFERENCES Abramowitz, M. and Stegun, I. A. (2012). Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables. Courier Corporation. Barabasi,´ A. and Albert, R. (1999). Emergence of scaling in random network. Science 286 509-512. Bollobas,´ B., Riordan, O., Spencer, J. and Tusnady,´ G. (2001). The degree sequence of a scale-free random graph process. Random Structures Algorithms 18 279–290. Bollobas,´ B., Borgs, C., Chayes, J. and Riordan, O. (2003). Directed scale-free graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, 2003) 132-139. Cooper, C. and Frieze, A. (2003). A general model of web graphs. Random Structures & Algorithms 22 311–335. Dereich, S. and Morters,¨ P. (2009). Random networks with sublinear preferential attachment: Degree evolutions. Electronic Journal of Probability 43 1222-1267. Durrett, R. T. (2019). Probability: Theory and Examples, ﬁfth ed. Cambridge Series in Statistical and Probabilistic Mathematics 49. Cambridge University Press, Cambridge. Elwes, R. (2016). A Linear Preferential Attachment Process Approaching the Rado Graph. http://arxiv.org/abs/1603.08806v2. Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D. and Zadeh, R. (2013). WTF: The Who to Follow Service at Twitter. In Proceedings of the 22Nd International Conference on World Wide Web. WWW ’13 505–514. ACM, New York, NY, USA. Liben-Nowell, D. and Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58 1019–1031. Resnick, S. I. and Samorodnitsky, G. (2015). Tauberian Theory for Multivariate Regularly Varying Distributions with Application to Preferential Attachment Networks. Extremes 18 349–367. Samorodnitsky, G., Resnick, S., Towsley, D., Davis, R., Willis, A. and Wan, P. (2016). Nonstandard regular variation of in-degree and out-degree in the preferential attachment model. Journal of Applied Probability 53 146-161. van der Hofstad, R. (2017). Random graphs and complex networks. Vol. 1. Cambridge Series in Statistical and Probabilistic Mathematics, [43]. Cambridge University Press, Cambridge.

Singapore University of Technology and Design LinkedIn Corporation, 700 E. Middlefield Road, 20 Dover Drive, Singapore 138682 Mountain View, CA 94043, USA E-mail: [email protected] E-mail: [email protected]