<<

Linked Decompositions of Networks and the Power of Choice in Polya Urns

Henry Lin∗ Christos Amanatidis† Martha Sideri‡ Richard M. Karp§ Christos H. Papadimitriou¶ October 12, 2007

Abstract balls total, and each arriving ball is placed in the least loaded A linked decomposition of a graph with n nodes is a set of m bins, drawn independently at random with probability of subgraphs covering the n nodes such that all pairs of proportional to load. Our analysis shows that in our new subgraphs intersect; we seek linked decompositions such process, with high probability the bin loads become roughly √ 2+² that all subgraphs have about n vertices, logarithmic balanced some time before O(n ) further balls have arrived diameter, and each vertex of the graph belongs to either and stay roughly balanced, regardless of how the initial O(n) one or two subgraphs. A linked decomposition enables balls were distributed, where ² > 0 can be arbitrarily small, many control and management functions to be implemented provided m is large enough. locally, such as resource sharing, maintenance of distributed directory structures, deadlock-free routing, failure recovery 1 Introduction and load balancing, without requiring any node to maintain Throwing balls into bins uniformly at random is well- information about the state of the network outside the known to produce relatively well-balanced occupancies subgraphs to which it belongs. Linked decompositions also with high probability. In contrast, when each new ball is enable efficient routing schemes with small routing tables, added to a bin selected with probability proportional to which we describe in Section 5. Our main contribution is its current occupancy, we get the much-studied model of to show that “Internet-like graphs” (e.g. the preferential Polya urns, which is known to produce large imbalances: attachment model proposed by Barabasi et al. [10] and with n urns, the largest urn is expected to have log n other similar models) have linked decompositions with the times more balls, and the smallest urn n times fewer parameters described above with high probability; moreover, balls, than the average one. It is a celebrated result our experiments show that the Internet topology itself can in our field that if, while throwing balls into bins, we be so decomposed. Our proof proceeds by analyzing a novel choose two random bins and add a ball only to the process, which we call Polya urns with the power of choice, smallest one, then this power of two choices makes the which may be of great independent interest. In this new already small imbalances significantly smaller [9, 15, 19]. process, we start with n nonempty bins containing O(n) But what about utilizing the power of choice in the Polya urns model? Does selecting m ≥ 2 bins according ∗ Division, University of California at Berke- to the Polya urn distribution, and adding a ball to the ley. Supported by NSF grant CCF-0635319 and NSF grant CCF- least loaded bin, result in balanced loads? 0515259. E-mail: [email protected] In this paper, we show that the power of choice in †Univ. of Economics and Business, , . Sup- the Polya urn model does balance bin loads. In partic- ported by a project co-financed by E.U. European Social Fund ular, we show (Theorem 3.2) that if n nonempty bins and the Greek Ministry of Development - GSRT. E-mail: 2+² amanatidis [email protected] start with O(n) balls total, then throwing O(n ) more ‡Univ. of Economics and Business, Athens, Greece. Sup- balls according to our new Polya urns process with the ported by a project co-financed by E.U. European Social Fund power of choice, balances all bins within a multiplica- and the Greek Ministry of Development - GSRT. E-mail: tive factor of (1 + ²) with high probability, where ² > 0 [email protected] can be made arbitrarily small, provided we make the §Computer Science Division, University of California at Berkeley. Supported by NSF grant CCF-0515259. E-mail: number of choices m large enough. (See Section 3 for [email protected] the exact dependence on m). Moreover, once the bins ¶Computer Science Division, University of California at Berke- become roughly balanced, they stay roughly balanced ley. Supported by NSF grant CCF-0635319 and a gift from Yahoo! as more balls are added. As m increases, our result be- Research. E-mail: [email protected]

993 comes essentially tight, since Ω(n2) balls are required But why should we expect that we can decompose to balance an initial distribution that starts with Ω(n) networks in this way? Rather surprisingly, our experi- balls in one bin, and 1 ball in every other bin. We can ments indicate that the existing Internet graph does have also show the bin loads balance when m = 2 choices are linked decompositions with these approximate parame- used and poly(n) balls are thrown, but it remains an ters, both at the router and autonomous system level. interesting open question whether or not the loads also For more details regarding our experiments, our exper- balance when O(n2+²) new balls are thrown with m = 2 imental results can be found in a longer version of this choices. paper to be posted online. √ The motivation for this investigation comes from a A linked decomposition with parameters a, c ≈ n, graph-theoretic problem related to the Internet. Let G b = 2, and d = O(log n) is a very demanding require- be a graph, and a, b, c, d be integer parameters. We say ment, and hence an unlikely property of graphs; on the that a graph G has an (a, b, c, d)-linked decomposition other hand the Internet seems to have it. We suspect if there are c connected subgraphs of G, which we call that this is not a coincidence, and that “any Internet- components, such that like graph” is very likely to have a linked decomposi- tion with these parameters. Towards this goal, we turn a 1. each component has between 4 and a nodes to the simplest and perhaps most influential model of Internet-like graphs, namely the preferential attachment 2. each node belongs to at least one and at most b model PA(m) proposed by [10] in 1999, and since then components studied extensively (see [11] for a survey). In this model, nodes arrive one after the other, and when a node ar- 3. the diameter of each component is at most d rives, it is connected via m edges to previously arrived nodes. In particular, for each new node t, we pick m 4. most importantly, any two components have a node previous nodes, i.i.d. at random and with replacement, in common. with probability proportional to the degree of the node at the current time, and create an edge from node t to Our notion of linked decomposition is related to the each of the m nodes picked with parallel edges possible. notion of sparse covers defined in [8], which decomposes Our main result is Theorem 2.1 which states that the graph into connected subgraphs satisfying proper- for any ² > 0, there is a m ≥ 2, such that PA(m) ties (2) and (3). Other related concepts of network de- produces a graph which has a linked decomposition with composition are introduced in [7] and [20]. √ parameters a = Θ(n1/2+²), b = 2, c = Θ(n1/2−²), and For our purposes, we would like a, c ≈ n, b = 2, d = O(log n), with probability 1 − n−². and d = O(log n), where n is the number of nodes in G. The proof of our main result is based on the It is not hard to see that a linked decomposition, if it Polya urns insight from the beginning of the Introduc- exists for graph G, would yield a routing with tion. To find our decomposition, the idea is to fol- unstructured node names with O(d) delay (worst-case low the preferential attachment process, and assign the number of hops) and O(d) traffic (total messages sent first c = Θ(n1/2−²) nodes to their own component. per packet), and O˜(ab + n/a) storage per node (see Sec- √ Then as the next n − c nodes arrive, we look at the tion 5 for details), which is about n for the parameters 2 nodes/components to which the m edges of arriving discussed. Although there is an existing routing scheme √ node point, and assign the new node to the component [4] known to achieve O˜( n) size routing tables while whose total degree (the sum of the degrees of the nodes maintaining short routing paths (stretch 3), our new in the component) is lowest. Now, provided the com- routing scheme is a bit simpler, and does not require ponents are roughly balanced in terms of total degree the use of landmark nodes, which may provide benefits after these first n nodes have arrived, we can assign the in terms of robustness and maintaining balanced con- 2 last n nodes to two components each in a simple way, gestion (Section 5 for more details). Moreover, using a 2 such that a coupon collector argument can be used to more relaxed form of linked decomposition and struc- show each pair of components intersect at some node tured node names, we can achieve about O(n1/3) stor- (i.e. property (4) of linked decomposition holds) with age per node and O˜(log n) delay/messages sent as well. high probability. As we show in Section 2, it is not hard In other words, a network with our linked decomposi- to prove that our procedure finds a linked decomposition tion property can be decomposed into subnetworks of with high probability, by showing the other three prop- balanced size that can be administered relatively inde- erties of linked decomposition hold as well with high pendently, with very little harm to communication. For probability. more details regarding our routing schemes and their The main challenge to proving Theorem 2.1 is implications for network routing, see Section 5.

994 proving that the components obtain roughly balanced and then m nodes are selected i.i.d. at random with n total degree after 2 nodes arrive (Theorem 3.1), which replacement, from among nodes 1, . . . , t, where each is needed to prove property (4) of linked decomposition dt(i) node i ≤ t is selected with probability 2mt and dt(i) holds. To prove that the degrees of the components is the degree of node i in Gt. Then edges are added in become roughly balanced when running PA(m), note Gt+1 from node t + 1 to the m selected nodes. This is a that if we think of the total degrees of each component very influential graph-theoretic model in the context of as the occupancy of a Polya urn, then our random the Internet (even though it is understood that it does process is very much like the Polya urns process with the not satisfy all known properties of the Internet graph). power of choice, defined previously, with c bins and m By the phrase “with high probability” (whp) in this choices. Although Polya urns with the power of choice paper we shall mean with probability greater than or is not quite the same process as the one we would like to equal to 1 − n−α, for some α > 0. In most cases, analyze, the analysis used in the proof of Theorem 3.2, α can be made arbitrarily large by deteriorating the which shows that Polya urns with the power of choice other parameters (e.g. by increasing m in Theorem produces roughly balanced bin loads when O(c2+²) balls 2.1). Following the definition of linked decomposition are thrown, can be modified to prove that the degrees described in the Introduction, we can prove one of our of the components balance when O(c2+²) new nodes main theorems: arrive, Theorem 3.1. Here again, in order to make ² arbitrarily small, we need to make m sufficiently large. Theorem 2.1. For any ² > 0, there exists a m ≥ 2, We expect that the dependence of ² on m to be similar such that a graph Gn generated by the PA(m) has for Theorem 3.1, as the dependence given for Polya a linked decomposition (whp) with parameters a = urns with the power of choice (see Section 3), but some Θ(n1/2+²), b = 2, c = Θ(n1/2−²), and d = O(log n). details remain to be checked. By proving Theorem 3.1, we complete the last step needed to prove that a linked Proof. To find a linked decomposition in a PA(m) decomposition with the required parameters exists with graph with n nodes, we follow the preferential attach- high probability. ment process, and as nodes arrive, we use a very simple One negative aspect of the proof above is the fact procedure to assign each node t to one or two compo- that we need to increase m in order to make ² arbitrarily nents. small. However, if we relax the definition of linked (1) For node t ∈ {1, . . . , n1/2−²}, we assign node t to decomposition so that each component may contain at its own component. most one special node violating property (2) of the linked decomposition — that is, some nodes may be 1/2−² n (2) For node t ∈ {n + 1,..., 2 }, we look at the allowed to belong to more than two components — then nodes/components to which the m edges of node t we can find a linked decomposition even when m = 2 point, and assign node t to the component whose (Theorem 4.1). Moreover, it is easy to see that routing total degree (i.e. the sum of the degrees of its is not harmed by this exception,√ and we obtain a slightly nodes) is lowest so far. lower value for a = Θ( n log n). The proof proceeds by n using a dynamic programming algorithm to decompose (3) For node t ∈ { 2 + 1, . . . , n}, we look at where the first two edges of node t point, say to nodes u and the networkp formed by the first Θ(n/ log n) nodes into c = Θ( n/ log n) components of approximately equal v, and assign node t to a component to which u size. Once we have c balanced components, it is not very belongs, and a component to which v belongs. (If hard to use a coupon collector argument to show that u or v belongs to two components, then we assign the remaining c2 log c nodes are enough to link together t to a random component of u or v). these components as required by property (4) of linked It is easy to see that we have decomposed the decomposition, and the other three properties of linked graph into connected components, and that we have decomposition are not difficult to prove as well. satisfied property 2 of linked decompositions with b = 2. Furthermore, since each node t has a constant 2 Linked Decompositions in PA(m) Graphs probability (dependent on m) of joining the component The preferential attachment model of random graphs t of a node v ≤ 2 , it is easy to prove that each component PA(m) [10, 11] creates a sequence of random graphs has diameter O(log n) (whp). G1,G2,...,Gt,... with multiple edges, where Gt has t To prove the final two properties of our linked nodes. G1 is a single node with no edges, and G2 is a decomposition, the key step is to show that at the graph with m edges between two nodes. To generate end of step (2), the total degree of each component is Gt+1 from Gt for t ≥ 2, a new node t + 1 is added, roughly balanced (whp). In the next section, we define

995 a Polya urns process, which models the degrees of the will be useful later on. At each step of our random components, and analyze it to prove one of our main process P, 2m new balls are thrown into the n bins, theorems, Theorem 3.1, which states that at the end of according to the following rule: step (2), the total degrees of any two components differ by at most a multiplicative factor of (1 + ²) (whp). • Pick m bins independently at random, with prob- Theorem 3.1 can be used to prove the final two ability proportional to the bin load properties hold (whp), because if we can show that total degree is roughly the same for each component (I) Throw 1 ball into each of the m random bins at the end of step (2), then it is not hard to apply picked Azuma’s inequality to show that the total degree of (II) Throw m more balls into the least loaded each component remains roughly balanced within a of the m random bins picked (break ties constant (whp) throughout step (3). (For more details, arbitrarily). see a longer version of paper of this paper to be posted online). Furthermore, if the components remain balanced within a multiplicative constant throughout We can prove the following theorem about our step (3), then at the end of step (3), each component random process P, for any arbitrarily small ² > 0, must have total degree Θ(n1/2+²). Therefore, each provided m is a sufficiently large constant: component contains at most O(n1/2+²) nodes, and Theorem 3.1. Given the starting conditions described we can conclude property 1 holds (whp) with a = 2+² Θ(n1/2+²), provided Theorem 3.1 holds. above, any time after Ω(n ) iterations of our process Moreover, Theorem 3.1 can also be used to show P, the loads of any two bins differ by a multiplicative that property 4 holds (whp). Recall that property 4 factor of at most (1 + ²) (whp). states that each pair of components must intersect at some node. If we view these Θ(n1−2²) pairs of inter- Remark: Note that proving Theorem 3.1 is sufficient sections as coupons we wish to collect, then step (3) for proving that properties (1) and (4) of linked decom- provides us with Θ(n) opportunities to collect Θ(n1−2²) position hold (whp), as we described previously, and coupons. Even though we are not collecting coupons completes the proof of Theorem 2.1. uniformly at random, Theorem 3.1 (and Azuma’s in- Even though step (II) of our random process tends equality) implies that the degrees of the components to balance bin loads, proving Theorem 3.1 is a bit chal- remain balanced throughout step (3) (whp). Further- lenging since step (I) tends to preserve load imbalances. more, this implies that each coupon is still collected with For these reasons, we study a simpler random process ¯ probability at least Ω(n−(1−2²)), and it is not hard to P on n bins, which we call Polya urns with the power of n choice. By proving P¯ balances bin loads (whp), we illus- see that given ( 2 ) opportunities, our process collects all coupons (intersections) within the required steps (whp). trate the main techniques needed to prove P produces Therefore, as long as we can prove Theorem 3.1, all four balanced bin loads (Theorem 3.1). Our new process, ¯ properties of our linked decomposition hold with high Polya urns with the power of choice P, starts with n probability. nonempty bins containing N0 =cn ˆ balls total. At each step of our random process P¯, we throw one more ball 3 Polya Urns with the Power of Choice into one of the n bins according to the following rule: To prove that the total degrees of the components (a) Pick m bins i.i.d. at random, with probability become balanced, we analyze an equivalent random proportional to bin load process P on n bins, where each bin represents a component and each bin’s load represents the size of (b) Throw 1 ball into the least loaded of the m random a component. (Here, n should be thought of as the bins picked number of components, called c in the definition of linked decomposition, and should not be confused with We can prove the following theorem about Polya the number of nodes in our graph). Analogous to the urns with the power of choice, for any ² > 0, provided preferential attachment process, our random process m is a sufficiently large constant: starts with 2mn balls distributed arbitrarily among n bins, such that each bin contains at least m balls each. Theorem 3.2. Given the starting conditions described Furthermore, it is known that with high probability our above, any time after Ω(n2+²) balls have been thrown, 1/2+² process starts with at most O(n ) balls in each bin the loads of any two bins differ by a multiplicative factor [14], where ² > 0 can be arbitrarily small, a fact which of at most (1 + ²) (whp).

996 To be more precise, we show that if each bin starts • If at least one of the random bins generated for the 1 with fractional load at least cnˆ forc ˆ > 1, then all new ball is a low bin, then we throw the new ball bins become balanced within (1 +² ˆ) (whp) some time into the first low bin generated. m−1 before O(n1/(1−(1−1/cˆ) )+1+ˆ²) new balls arrive, and • If all the bins generated are high bins, then we they stay balanced within (1 +² ˆ) (whp) at all later throw the new ball into the first random (high) bin times, where² ˆ > 0 can be arbitrarily small, provided generated. n is large enough. Note that for sufficiently large m or sufficiently smallc ˆ, we can conclude the bin loads In phase B, we throw balls into bins until the bins 2+² balance in O(n ) steps (whp) for any ² > 0. 1+²0 contain a total of NB = Θ(NA ) balls. Let M be Due to space limitations, we only provide a sketch the ( n )th smallest bin at the end of phase A, where c2 of Theorem 3.2 in the next subsection, and a full proof c2 > 1 is a constant dependent on ²0. We use bin M can be found in a longer version of this paper to be to classify the bins into three types of bins: A bin is posted online. Additionally, we defer the full proof of considered to be a middle bin at the current time if it Theorem 3.1 to the longer version of this paper. The has obtained the same load as bin M, at some point in proof for Theorem 3.1 follows roughly the same steps time in phase B, prior to or equal to the current time. as the proof for Theorem 3.2, although the details are A bin is low if it has never obtained the same load as slightly different, since our random process is slightly bin M at any time during phase B, prior to the current different. time, and it currently contains strictly less load then bin M. The remaining bins, those that have so far always 3.1 An Alternate Random Process To prove been strictly higher than bin M in phase B, are high Theorem 3.2, we start by defining a new random pro- bins. In Phase B, we throw each new ball into a bin as ¯ cess P0 which is somewhat easier to analyze than the follows: Polya urn process with the power of choice P¯. Our new random process P¯0 also throws balls into bins one at • We look at the random bins generated for the new a time, and for each ball thrown, m random bins are ball, and we let T ∈ {low, middle, high} be the generated in the same manner as described for P¯. How- lowest bin type generated. ever, our new process P¯ sometimes ignores the power 0 • We throw a ball into the first bin generated of type of m choices and throws a ball into a bin that is not T . the least loaded of the m bins. Eventually, our goal will be to show that for any arbitrarily small ² > 0, after To complete the proof, we prove the following five claims 2+² ¯ NF = Θ(n ) balls are thrown according to P0, the about the two phases: number of balls in the bins are within a factor of (1 + ²) from one another, with high probability. • At the end of phase A: It is easy to see that, if ~a(t) represents the bin Claim 1: Each bin contains at least ω(log n) loads of the Polya urns process with the power of balls with high probability. choice P¯ at time t and ~b(t) represents the bin loads Claim 2: Each bin contains at most 1.01( c1 ) of the new process P¯0 (or any process that sometimes n ignores the power of m choices) at time t, then there fractional load with high probability. exists a coupling of the two processes such that ~a(t) • If each bin contains at least ω(log n) balls and at always majorizes ~b(t); we omit the formal details of c1 most 1.01( n ) fractional load at the end of phase the coupling, which are not difficult. Hence, to prove A, then with high probability: the theorem, we only need to analyze our new random Claim 3: Every high bin becomes a middle process P¯0, which throws balls in two phases, defined below. bin sometime during phase B. 2+²0 In phase A, we throw NA = Θ(n ) balls into Claim 4: Every low bin becomes a middle bin bins, where ²0 > 0 is an arbitrarily small constant. At sometime during phase B. any time in phase A, we classify the bins into two types Claim 5: If a bin becomes a middle bin (i.e. of bins: A bin is considered to be low if it contains obtains the same load as bin M) sometime c1 less than n fraction of the balls, otherwise it is high. during phase B, then at the end of phase B, Here, c is a constant greater than 1 to be defined later, 1 the load of that bin is at most (1+²1) times the dependent on ² . Given these two definitions, we throw 0 load of bin M, and at least (1 − ²1) times the each new ball into a bin as follows: load of bin M, where ²1 > 0 can be arbitrarily small, provided n is sufficiently large.

997 Note that the five claims imply Theorem 3.2, and the fractional load of each high bin decreases (whp), and by making ²1 and ²0 arbitrarily small, we also make subsequently each high bin must become a middle bin ² arbitrarily small. Proving the five claims requires (whp) after only O(NA) new balls have been thrown. applying a variety of probability theory techniques, but Thus, Claim 3 also holds. due to space limitations, we only provide a rough proof To prove Claim 4, we first note that the probability m sketch for each claim in the next subsection. The full of throwing a ball into a low bin at time t is 1−(1−lt) , details can be found in an online version of this paper. where lt is the total fractional load of the low bins at time t, and the probability of throwing a ball into a 3.2 A Proof Sketch for Each Claim To prove particular low bin with fractional load pt at time t is m m pt 1−(1−lt) Claim 1, we first note that a coupling can be defined (1 − (1 − lt) ) · = · pt. Now, with some ¯ lt lt such that bin loads arising from the P0 process ma- more work, we can prove that (whp) at any time in m jorize the bin loads arising from the standard Polya 1−(1−lt) phase B, l ≥ γ, where γ can be arbitrarily process, which throws balls into bins with probability t large, provided we set c2 large enough. Given that this proportional to load, but does not utilize the power of growth condition holds for any low bin at any time t in choice. We can then utilize a known result, which lower phase B (whp), we can then prove a lemma which states bounds the load of the lowest bin arising from the stan- 1+ 1 that after O(N .99γ−1 ) more balls have been thrown, dard Polya urn process [13]. This lower bound states A each low bin must become a middle bin sometime during that the lowest bin will contain ω(log n) balls (whp) phase B (whp). Thus, after N = Θ(N 1+²0 ) more balls after N balls have been thrown according to the stan- B A A have been thrown in phase B, all low bins must become dard Polya urns process. Furthermore, this lower bound middle bins (whp), where ² > 0 can be arbitrarily and the coupling argument then implies that at the end 0 small, provided we set c large enough, and Claim 4 of phase A of process P¯ , each bin also must contain 2 0 follows. ω(log n) balls (whp), and thus Claim 1 holds. To prove Claim 5, we just need to apply Azuma’s To prove Claims 2, we first note that the probability inequality to an appropriately defined martingale. Con- of throwing a ball into a high bin is (h )m, where h t t sider defining the random variable a to be the number is the total fractional load of all high bins at time t, t of balls in the marker bin M at time t, and b to be the and the probability of throwing a ball into a particular t number of balls in any other middle bin B at time t. It high bin with fractional load p at time t is equal to t at m pt m−1 is not hard to show that the random variable Xt ≡ (ht) · = (ht) pt. Provided each bin starts with at+bt ht is a martingale. Now, to prove that all middle bins con- fractional load at least 1 , we can then do some work cnˆ tain roughly the same load as bin M at the end of phase to upper bound h ≤ (1 − 1 ) + ² at any time t in phase t cˆ 1 B (whp), we just need to note that each middle obtains A (whp), for any arbitrarily small ²1 > 0, provided we the same load as bin M at some point t0 in phase B. set c1 large enough. Thus, we can upper bound the Furthermore, if we apply Azuma’s inequality on the se- probability of throwing a ball into any high bin by γpt quence of random variables X , starting from time t (whp), where γ can be made arbitrarily close to (1− 1 ). t 0 cˆ and ending at the end of phase B, then we can conclude Now note that since γ < 1 for small ²1, the high bins are that the fractional load of marker bin M and the frac- in essence suffering a shrinkage condition, which causes tional load of bin B remain close at the end of phase B their fractional load to decrease (whp). Given that the (whp), and thus Claim 5 follows. probability of throwing a ball into a high bin is bounded by γpt, we can then prove a lemma, which states that 1 4 Linked Decompositions with Exceptions 1/(1−(1− 1 )m)+1+ˆ² after Ω(N · n 1−(1+²1)γ ) = Ω(n cˆ 1 ) new 0 The proof in the previous section that PA(m) graphs balls have been thrown, each high bin contains at most have linked decompositions requires m to be a large 1.01 c1 fractional load (whp), where² ˆ can be made n 1 constant, and the graphs produced have expected degree arbitrarily small, by setting c large enough. Note that 1 2m. The Internet, however, has average degree about our last lemma then implies that Claim 2 holds (whp), four; as a result, the PA(m) model is not considered a and only N = Θ(n2+²0 ) new balls need to be thrown, A realistic model of the Internet unless m is very small. where ² > 0 can be arbitrarily small, provided m is 0 Our next result holds when m ≥ 2, and achieves sufficiently large. √ a slightly better a = n log n, but it achieves a To prove Claim 3, we follow the same steps as Claim weaker form of linked decomposition. In particular, 2, and show that (whp) at any time in phase B the we define a linked decomposition with exceptions and probability that a high bin receives a new ball is ≤ γp , t parameters a, b, c, d to be a decomposition satisfying the where γ is a constant strictly less than 1. This shrinkage 4 requirements, except that each subgraph is allowed condition can then be used with a lemma to show that

998 to contain one “promiscuous” node, which may belong edges are [i, j] and [i, j0]; we assign i to both subgraphs to more than b subgraphs. It is not hard to see to which j and j0 belong; this way we ensure that these that the routing properties of the decomposition (see two subgraphs have a node in common. In other words, Section 5) are preserved in the face of a single exception each new node i can be seen as a draw of one of the per component (the remaining nodes essentially “route c2 possible pairs of subgraphs, each with probability around it”). We can show: that is Ω(1/c2). By the coupon collector principle, after Θ(c2 log c) < n − t nodes, all components will intersect Theorem 4.1. A graph Gn generated by the PA(2) with high probability, completing the proof. model has, with high probability,√ a linked decomposition withp exceptions and a = Θ( n log n), b = 2, c = Another important model of Internet-like graphs Θ( n/ log n), and d = O(log n). [5, 18] is the one in which we start with a degree sequence d ≥ d ≥ · · · ≥ d , such that d = d · i−α for Proof. We start by running PA(2) for t = n/ log n 1 2 n i 1 some α between 1/2 and 1, and then an edge is added steps, and then we decompose the resulting G into p t between nodes i and j with probability proportional to n/ log n small diameter subgraphs, each with a total d · d . It is clear that, in such a graph, the degrees will degree (sum of the degrees of its nodes) that is between i j √ be, in expectation, proportional to the d ’s. We call this s and 3s, where s = n log n. These subgraphs will be i the degree sequence model. By a similar argument and completely disjoint, except for at most one exceptional construction, which we omit, we can show the following: node in each. Due to space limitations, we only provide a rough Theorem 4.2.√ Linked decompositions with exceptions sketch regarding how to obtain this starting decompo- and a = Θ( n log n) can be obtained in the degree sition. We start by computing a breadth-first search 1 sequence model, as well as for Gn,p, with p = Θ( n ). tree of Gt starting from the first node, thus achiev- ing log n/ log log n diameter with high probability [11]. Finally, we note that by another simple argument, From here, it is not hard to show that one can start random graphs in the Gn,p model have linked decom- from the bottom of the tree and iteratively find con- positions (without exceptions), provided that p is above nected subtrees of total degree between s and 2s, until log n/n. one subtree remains of total degree at most 3s. These subtrees cover all nodes, and each subtree only inter- 5 The Internet, Routing, and Experiments sects other subtrees at one of its nodes (the node one In recent years, we have seen a surge of research activity closest to the root). aiming at a theoretical and foundational understand- Once we have this initial decomposition, with total ing of the Internet. The motivation for such a research degrees balanced within a constant, we continue the agenda is twofold: first, the Internet is a novel, fascinat- PA(m) process for the remaining n − t steps. Notice ing, and intellectually challenging computational arti- that, for each new node i and edge [i, j] generated fact of central importance to computing technology and out of it, the subgraph to which the other endpoint j society in general, and hence it is naturally an attrac- belongs is generated by a distribution that is, initially, tive subject for theoreticians. Second, even though the approximately uniform at random; that is, each has Internet has emerged without much deliberate design, probability Θ(1/c). We show next that, with high such design may become necessary in the future; founda- probability, this approximately uniform distribution will tional understanding and theoretical insights would be be maintained throughout the process. This follows handy at such a juncture. Indeed, as challenges of scale from the following lemma, which can be proved using accumulate, there are several serious efforts underway the techniques described in a longer version of this paper to “redesign” the Internet [2, 3, 1, 12]. It is within this (essentially applying Azuma’s inequality): framework that we see the concept of linked decompo- sitions, and the theoretical and experimental evidence Lemma 4.1. Let x1, . . . , xc be the fractional loads of c bins containing at least ω(log c) balls each, and let presented here is that it is feasible in the context of the 0 0 Internet. x1, . . . , xc be the fractional loads of the bins at any later time, after running the Polya urns process. Then with It is not hard to see that a linked decomposition high probability |x0 − x | ≤ ² for all k ∈ {1, . . . , c}, with parameters a, b, c, d enables a novel form of Internet k k ˜ where ² > 0 can be arbitrarily small, provided c is large routing with routing tables of size O(ab + n/a), delay enough. O(d), and O(d) packets sent per message routed. To show how this can be done, first note that each node How do we assign new nodes to (up to b = 2) of the v only needs O˜(ab) space to store enough information c subgraphs? For each new node i, suppose that its two to route to any node u which belongs to one of v’s

999 components. Furthermore, in order to route to any node mented even if the addresses of the vertices are totally u, which does not belong to one of v’s components, amorphous strings (called “flat labels” in the Internet v just needs to find an intermediate node w, which redesign literature [12]), as opposed to today’s hierar- belongs both to one of u’s components and one of v’s chical and geographically specific IP addresses — a key components. Note that such a node w must exist due feature of today’s Internet, which is also the source of to property (4) of our linked decomposition. Once we some of the most challenging problems of scale and evo- have found our node w, it is easy to route our message lution. from v to u through w. By giving up “flat labels” and loosening our linked Conceptually, storing an intermediate node w for decomposition requirements, we can decrease table size each destination node u appears to require a table of to about n1/3, as follows: Instead of requiring that size O(n), but fortunately node v does not have to all components intersect, we only require that any store the entire table, as it is not hard to distribute two components either intersect with one another, or this table among the nodes in v’s components, such both intersect at some other component. That is, the that each node only has to store O˜(n/a) information. intersection graph of the components is no longer a In particular, we can use a hash function to distribute clique, but has diameter two. It turns out that, if the information in a balanced manner among the nodes, the address of each node also indicates a component such that u can find the required intermediate node w to which the node belongs, then routing tables of size for any destination node v, while only storing O˜(n/a) about a suffice. And, now by applying both a coupon information per node. Thus, we have a routing protocol collector and a birthday paradox argument (details that uses O˜(ab + n/a) storage and routes with delay omitted) it can be shown that PA(m) graphs have such 1/3 O(d).√ Provided we can find linked decompositions with a decomposition with a = n (whp). a ≈ n, b = 2, and d = O(log n), as described The routing implications of our results explained previously, we can now define√ a simple routing scheme in this section are related to two recent ideas. The with routing tables size about n and delay O(log n). one that provided direct inspiration is the routing on For a concrete example, see Figure 1 below. To flat labels (rofl) proposal [12], a novel routing archi- route a packet from vertex 1 to vertex 2 in the given tecture borrowing methodologically from peer-to-peer example, vertex 1 knows how to reach any node in networks. We came up with the concept of linked de- component S1 and can route the packet entirely within compositions while trying to identify the limits of their component S1 by following the links in S1. To route approach. The rofl work is validated experimentally, a packet from vertex 1 to vertex 3, which is not in any and does not provide nontrivial performance guaran- component of vertex 1, we a send lookup message within tees. Another related body of work is that on compact component S1 to determine the intermediate vertex to routing [6, 4, 16, 17] seeking routing with use, in this case vertex 4. The packet is then routed to small routing tables and small stretch (worst-case ra- vertex 4, and vertex 4 uses its routing table to complete tio between routing delay and distance in the network). the route to vertex 3. The√ strongest known such result achieves, for any graph, O˜( n) tables and stretch 3 [4], and it is in fact known empirically [16] that this algorithm runs better on the S2

S1 real Internet. Although the existing compact routing work dom- 2 4 inates our results, our approach is conceptually sim- pler, and does not require the use of landmark nodes. Avoiding the use of landmark nodes, may provide ben- S 3 efits in terms of congestion and robustness against fail- ures, since as many as Ω(n3/2) pairs of nodes may route packets√ through a single landmark node in the exist- ing O˜( n) routing scheme. As we describe in the open 1 3 problems section, we hope to explore the potential ben- efits which linked decompositions may provide in terms of balanced congestion and robustness. In our experi- Figure 1: In this example, we show a decomposition ments, the stretch of our routing algorithm is very rarely with a = 6, b = 2, c = 3, and d = 4. over 3. Furthermore, our work provides a rigorous ex- planation for the surprising empirical finding (see longer version of paper online for experiments) that the actual Note that the above routing scheme can be imple-

1000 Internet can be decomposed in such a demanding way. Acknowledgment: We want to thank Kevin Fall, Robert Kleinberg, Dima Krioukov, Satish Rao, Luca 6 Open Problems Trevisan, Grant Schoenbeck, Scott Shenker, Ion Stoica, Our proofs are in some sense existential and non- and Mikkel Thorup for valuable discussions. In partic- constructive: Even though we give a decomposition al- ular, Satish Rao provided the distributed storage idea gorithm, this algorithm needs to “see” the actual run- in our routing scheme. ning of the PA(m) process in order to work with high probability; what if we are given ex post a graph that References has been generated by PA(m), but with its nodes per- muted? Or if we are actually given the graph of the Internet? In our experiments with permuted PA(m) [1] NewArch Project: Future-Generation Internet Archi- graphs and Internet graphs we run our decomposition tecture. http://www.isi.edu/newarch/. algorithms on the graph with nodes ordered in decreas- [2] FIND: Future Internet Network Design. ing degree. The intuition is that this is our best guess http://find.isi.edu/, 2005. for the creation order. This works well in practice (see [3] GENI: Global Environment for Network Innovations. longer version of paper online for more details), and we http://www.geni.net/, 2005. would love to prove that it works with high probability [4] I. Abraham, C. Gavoille, D. Malkhi, N. Nisan, and M. Thorup. Compact name-independent routing with on PA(m) graphs. minimum stretch. The Sixteenth ACM Symposium There are also interesting questions regarding the on Parallelism in Algorithms and Architectures (SPAA √existence of linked√ decompositions with parameters a ≈ 04), 2004. n, b = 2, c ≈ n, d ≈ diameter(G). We have found [5] W. Aiello, F. Chung, and L. Lu. A random graph that graphs generated by the Gn,p model, the degree model for massive graphs. In STOC, 2000. sequence model, and the PA(m) model all have linked [6] Marta Arias, Lenore J. Cowen, Kofi A. Laing, Rajmo- decompositions (whp). Yet if the graph G is a tree, han Rajaraman, and Orjeta Taka. Compact routing a linked decomposition with the described parameters with name independence. In Proceedings of the fif- cannot exist, essentially because the required compo- teenth annual ACM symposium on Parallel algorithms nent intersections cannot be created without forcing at and architectures (SPAA 03), pages 184–192, New least one node to be included in too many components. York, NY, USA, 2003. ACM Press. [7] Baruch Awerbuch, Andrew Goldberg, Michael Luby, This begs the question: what are necessary and suffi- and Serge Plotkin. Network decompositions and local- cient conditions for a graph G to have a linked decompo- ity in distributed computation. 1989. sition with the parameters described? Can all expander [8] Baruch Awerbuch and David Peleg. Sparse partitions. graphs be decomposed in this way? 1990. In addition, some other open questions we would [9] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and like to answer are: Eli Upfal. Balanced allocations (extended abstract). In STOC ’94: Proceedings of the twenty-sixth annual • Does the Polya urn process with the power of two ACM symposium on Theory of computing, pages 593– choices also balance after O(n2+²) steps? In the 602, New York, NY, USA, 1994. ACM Press. case of interest, when each bin starts with fractional [10] A. Barabasi and R. Albert. Emergence of scaling in 1 random networks. Science, 286:509–512, 1999. load at least 2n , we can prove the power of two choices balances loads after n3+² steps (whp), but [11] B. Bollob´asand O. Riordan. Mathematical results on scale-free random graphs. In Handbook of Graphs and is an exponent arbitrarily close to two possible? Networks. Wiley-VCH, Berlin, 2002. [12] M. Caesar, T. Condie, J. Kumar Kannan, K. Lakshmi- • Does our routing algorithm yield better worst- ˜ √ narayanan, I. Stoica, and S. Shenker. ROFL: Routing case congestion than the existing O( n) routing on Flat Labels. SIGCOMM, 2006. scheme? We have removed the use of landmark [13] F. Chung, S. Handjani, and D. Jungreis. Generaliza- nodes, but is it enough to balance congestion? tions of polya’s urn problem. Annals of Combinatorics, 7:141–153, 2003. • Are there efficient ways to update the routing tables [14] A. Frieze, A. Flaxman, and T. Fenner. High degree with the addition of new nodes and edges, or more vertices and eigenvalues in the preferential attachment importantly, node and edge deletions? graph. Internet Mathematics, 2:1–20, 2005. [15] Richard M. Karp, Michael Luby, and Friedhelm Meyer • Can autonomous systems in a network be appropri- auf der Heide. Efficient pram simulation on a dis- ately incentivized to organize themselves in linked tributed memory machine. In STOC ’92: Proceedings decompositions? of the twenty-fourth annual ACM symposium on The-

1001 ory of computing, pages 318–326, New York, NY, USA, 1992. ACM Press. [16] D. Krioukov, K. Fall, and X. Yang. Compact routing on internet-like graphs. INFOCOM, 2004. [17] D. Krioukov and kc claffy. Toward compact interdo- main routing. arXiv:cs.NI/0508021. [18] Milena Mihail, Christos Papadimitriou, and Amin Saberi. On certain connectivity properties of the in- ternet topology. In FOCS ’03: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, page 28, Washington, DC, USA, 2003. IEEE Computer Society. [19] Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, 12(10):1094–1104, 2001. [20] David Peleg and Alejandro A. Schaffer. Graph span- ners. 1989.

1002