Censorship Resistant Peer-to-Peer Content Addressable Networks

Amos Fiat* Jared Saiat

Thomas Hobbes, Rene Descartes, t;~ancis Bacon, points in the system. For example, the Napster ([28]) Benedict Spinoza, John Milton, John Locke, Daniel file sharing system has been effectively dismembered by Defoe, David Hume, Jean-Jacques Rousseau, Blaise legal attacks on the central server. Additionally, the Pascal, lmmanual Kant, Giovanni Co~anova, John Stuart Mill, Emile Zola, Jean-Paul Sartre, Victor Gnutella ([27]) file sharing system, while specifically Hugo, Honore de Balzac, A. Dumas pete, A. Dumas designed to avoid the vulnerability of a central server, ill, Gus~a~e Flaubert, Rabelais, Montaigne, La is highly vulnerable to attack by removing a very small Fontaine, Voltaire, Denis Diderot, Pierre Larousse, number of carefully chosen nodes ([24]). Anatole France A significant performance problem with Gnutella Partial List o] authors in Index Librorum Pro- is that it requires up to O(n) messages for a search hibitorum (Index o] Prohibited Books) from the (search is performed via a general broadcast), this has Roman O~ice of the In~luisition , 1559-1966. effectively limited the size of Gnntella networks to about 1000 nodes ([6]). Abstract A more principled approach than the centralized ap- proach taken by Napster or the broadcast search mecha- We present a censorship resistant peer-to-peer Content nism of Gnutella is the use of a content addressable net- Addressable Network for accessing n data items in a work ([23]). A content addressable network is defined network of n nodes. Each search for a data item in as a distributed, scalable, indexing scheme for peer-to- the network takes O(logn) time and requires at most peer networks. Plaxton, Rajaram and Richa ([22]) give O(log 2 n) messages. Our network is censorship resistant a scheme to implement a context addressable network in the sense that even after adversarial removal of an (prior to its definition) in a web cache environment. The arbitrarily large constant fraction of the nodes in the content addressable network scheme of ([22]), as mod- network, all but an arbitrarily small fraction of the ified in ([29]), has been used in implementations such remaining nodes can obtain all but an arbitrarily small as Oceanstore ([11]). Subsequent content addressable fraction of the original data items. The network can be networks have been suggested in ([23, 26, 29]). These created in a fully distributed fashion. It requires only papers seek to improve upon ([22]) in adding fault resis- O(log n) memory in each node. We also give a variant of tance and load balancing mechanisms. However, while our scheme that has the property that it is highly spare these schemes are robust to a large number of random resistant: an adversary can take over complete control faults, none of these schemes are robust against adver- of a constant fraction of the nodes in the network and sarial faults. yet will still be unable to generate spare. 1.1 Resistance to Adversarial Node Deletion 1 Introduction We present a content addressable network with n nodes Web content is under attack by state and corporate used to store n distinct data items 1. As far as we know efforts to censor it, for political and commercial reasons this is the first such scheme of its kind. The scheme is ([7, 18, 16]). robust to adversarial deletion of up to 2 half of the nodes Peer-to-peer networks are considered more robust in the network and has the following properties: against censorship than standard web servers ([19]). However, while it is true that many suggested peer- to-peer architectures are fairly robust against random ]For simplicity, we've assumed that the number of items and the number of nodes is equal. However, for any n nodes and faults, the censors can attack carefully chosen weak m ~ n data items, our scheme will work~ where the search time remains O(log n), the number of messages remulna O(log 2 n), and -*-]~paxtment of Computer Science, Tel Avlv University, work the storage requirements are O(logn x re~n) per node. done while on Sabbatical at University of Washington, Seattle. 2For simplicity, we give the proofs with this constant equal to emaih fiat~math.tau.ac.il 1/2. Howeverwe can easily modify the scheme to work for any f Department of Computer Science, University of Washington, constant less than 1. This would change the constants involved Seattle. ema~l: saia~}cs.washington.edu. in storage, search time, and messages sent, by a constant factor. 95

1. With high probability, all but an arbitrarily small theorem, Theorem 3.1, is given in Section 4. In Section fraction of the nodes can find all but an arbitrarily 5 we sketch the modifications required in the algorithms small fraction of the data items. and the proofs to obtain spam resistance, the main the- orem with regard to spare resistant content addressable 2. Search takes (parallel) time O(log n). networks is Theorem 5.1. We conclude and give direc- tions for future work in Section 6. Acknowledgements 3. Search requires O(log 2 n) messages in total. are in section 7. 4. Every node requires O(logn) storage. 2 Related Work As stated above, in the context of state or corporate 2.1 Peer-to-peer Networks (not Content Ad- attempts at censorship, it seems reasonable to consider dressable) Peer-to-peer networks are a relatively re- adversar/a/attacks rather than random deletion. Our cent and quickly growing phenomena. The average scheme is a content addressable network that is robust number of Gnutella users in any given day is no less than against adversarial deletion. 10,000 and may range as high as 30,000 ([6]). Napster We remark that such a network is clearly resilient software has been downloaded by 50 million users ([231). to having up to 1/2 of the nodes removed at random, (in Pandurangam, Raghavan, and Upfal ([20l) address actuality, its random removal resiliency is much better). the problem of maintaining a connected network under We further remark that if nodes come up and down over a probabilistic model of node arrival and departure. time, our network will continue to operate as required They do not deal with the question of searching within so long as at least n/2 of the nodes are alive. the network. They give a protocol which maintains a network on n nodes with diameter O(logn). The 1.2 Spare Resistance Another problem with peer- protocol requires constant memory per node and a to-peer networks has been that of spamming ([5, 6]). central hub with constant memory with which all nodes Because the data items reside in the nodes of the can connect. network, and pass through nodes while in transit, it Experimental measurements of a connected compo- is possible for nodes to invent alternative data and pass nent of the real Gnutella network have been studied it on as though it was the sought after data item. ([24]), and it has been found to stillcontain a large con- We now describe a spare resistant variant of our nected component even with a 1/3 fraction of random content addressable network. To the best of our knowl- node deletions. edge this is the first such scheme of its kind. As before, assume n nodes used to store n distinct data items. 2.2 Content Addressable Networks -- Random The adversary may choose up to some constant c < 1/2 Faults There are several papers that address the prob- fraction of the nodes in the network. These nodes under lem of creating a content addressable network. As men- adversary control may be deleted, or they may collude tioned above, Plaxton, Rajaram and Richa ([22]) give and transmit arbitrary false versions of the data item, a context addressable network for web caching. Search nonetheless: time and the total number of messages is {)(log n), and 1. With high probability, all but an arbitrarily small storage requirements are O(logn) per node. fraction of the nodes will be able to obtain all but Tapestry ([29]) is an extension to the ([22]) mecha- an arbitrarily small fraction of the true data items. nism, designed to be robust against faults. It is used in To clarify this point, the search will not result in the Oceanstore ([11]) system. Experimental evidence is multiple items, one of which is the correct item. supplied that Tapestry is robust against random faults. The search will result in one unequivocal true item. lq~tnasamy et. M. ([23]) describe a system called CAN which has the topology of a d-dimensional torus. 2. Search takes (parallel) time 0(log n). As a function of d, storage requirements axe O(d) per node, whereas search time and the total number of 3. Search requires O(log 3 n) messages in total. messages is O(dnl/d). There is experimental evidence that CAN is robust to random faults. 4. Every node requires O(log 2 n) storage. Finally, Stoica et. a]. introduce yet another content The rest of our paper is structured as follows. We addressable network, Chord ([26]), which, like ([20]) and review related work in Section 2. We give the algorithm ([29]), requires O(log n) memory per node and O(log n) for creation of our robust content addressable network, search time. Chord is provably robust to a constant the search mechanism, and properties of the content ad- fraction of random node failures. dressable network in Section 3. The proof of our main 96

2.3 Faults on Networks storage per node and ~/(n) search time per query. Alon et al. ([1]) give a method which safely stores a 2.3.1 Random Faults There is a large body of work document in a decentralized storage setting where up to on node and edge faults that occur independently at half the storage devices may be faulty. The application random in a general network. H£stad, Leighton and context of their work is a storage system consisting of a Newman ([9]) address the problem of routing when set of servers and a set of clients where each client can there axe node and edge faults on the hypercube which communicate with all the servers. Their scheme involves occur independently at random with some probability distributing specially encoded pieces of the document to p < 1. They give a O(log n) step routing algorithm that all the servers in the network. ensures the delivery of messages with high probability Aumann and Bender ([3]) consider tolerance of even when a constant fraction of the nodes and edges pointer-based data structures to worse case memory have failed. They also show that a faulty hypercube failures. They present fault tolerant variants of stacks, can emulate a fault-free hypercube with only constant lists and trees. They give a fault tolerant tree with the slowdown. property that if r adversaxial faults occur, no more than Karlin, Nelson and Tamaki ([10]) explore the fault O(r) of the data in the tree is lost. This fault tolerant tolerance of the butterfly network against edge faults tree is based on the use of expander graphs. that occur independently at random with probability p. Quorum systems ([8, 13, 14]) axe an efficient, robust They show that there is a critical probability p* such way to read and write to a variable which is shared that if p is less than p*, the faulted butterfly almost among n servers. Many of these systems are resistant up surely contains a lineax-sized component and that ifp is to some number b < r,]4 of Byzantine faults. The key greater them p*, the faulted butterfly does not conta~ idea in such systems is to create subsets of the servers a linear sized component. called quorums in such a way that any two quorums Leighton, Maggs and Sitamaran ([12]) show that a contain at least 2b + 1 servers in common. A client butterfly network whose nodes fail with some constant that wants to write to the shared variable will broadcast probability p can emulate a fault-free network of the the new value to all servers in some quorum. A client same size with a slowdown of 2 °(l°g" n). that wants to read the variable will get values from all members in some quorum and will keep only that value 2.3.2 Adversarial Faults It is well known that which has the most recent time stamp and is returned many common network topologies are not resistant to by at least b + 1 servers. For quorum systems that axe a linear number of adversarial faults. With a linear resistant to O(n) faults the load on the servers can be number of faults, the hypercube can be fragmented high. In particular, O(n) servers will be involved in a into components all of which have size no more than constant fraction of the queries. O(n/ov~-~-n) ([9]). The best known lower bound on the Recently Malkhi et. a/. [14] have introduced a number of adversaxial faults a hypercube can tolerate probabilistic quorum system. This new system relaxes and still be able to emulate a fault free hypercube of the constraint that there must be 2b + 1 servers shared the same size is O(log n) ([9]). between any two quoroms and remains resistant to Leighton, Maggs and Sitamaran ([12]) analyze the Byzantine faults only with high probability. The load fault tolerance of several bounded degree networks. One on servers in the probabilistic system is less than the of their results is that any n node butterfly network load in the deterministic system. Nonetheless, for a containing n 1-~ (for any constant e > 0) faults can probabilistic quorum system which is resistant to O(n) emulate a fault free butterfly network of the same size faults, there still will be at least one server involved in with only constant slowdown. The same result is given a constant fraction of the queries. for the shuffie-exchange network. 3 The Content Addressable Network 2.4 Other Related Work One attempt at censor- We now state our mechanism for providing indexing of ship resistant web publishing is the Publius system n data items by n nodes in a network that is robust ([15]), while this system has many desirable proper- to removal of any n/2 of the nodes. We make use of a ties, it is not a peer-to-peer network. Publius makes butterfly network of depth log n - log log,z, we call the use of many cryptographic elements and uses Shamir's nodes of the butterfly network 8upernodes (see Figure threshold secret sharing scheme ([25]) to split the shares 1). Every supernode is associated with a set of nodes. amongst many servers. When viewed as a peer-to-peer We call a supernode at the topmost level of the butterfly network, with n nodes and n data items, to be resistant a top supernode, one at the bottommost level of the to n/2 adversaxial node removals, Publius requires f/(n) network a bottom supernode and one at neither the 97

( hashed (if any bottom supernode has more than /3B In n data items hashed to it, it drops out of the ! network.) • In addition, every one of the nodes chooses T top supernodes of the butterfly and points to all component nodes of these supernodes.

To perform a search for a data item, starting from node v, we do the following:

Figure 1: The butterfly network of supernodes. 1. Take the hash of the data item and interpret it as a sequence of indices Q,i2,..., iB, 0 < it < n/logn.

2. Let tl,t2,... ,tT be the top supernodes to which v points.

3. l:tepeat in parallel for all values of k between 1 and T: (a) Let£=l. (b) Repeat until successful or until ~ > B: i. Follow the path from tk to the supernode at the bottom level whose index is it: Figure 2: The expander graphs between supernodes. • Transmit the query to all of the nodes in t,. Let W be the set of all such nodes. topmost or bottommost level a middle supernode. • Repeat until a bottom supernode is To construct the network we do the following: reached: • We choose an error parameter e > 0, and as a - The nodes in W transmit the query function of e we determine constants C, B, T, D, to all of their neighbors along the a and ft. (See Theorem 3.1). (unique) butterfly path to it, Let W be this new set of nodes. • Every node chooses at random C top supernodes, C • When the bottom supernode is bottom supernodes and C log n middle supernodes reached, fetch the content from what- to which it will belong. ever node has been reached. • Between two sets of nodes associated with two • The content, if found, is transmitted back along the same path as the query supernodes connected in the butterfly network, we was transmitted downwards. choose a random constant degree expander graph ii. Increment L of degree D (see Figure 2). (We do this only if both sets of nodes are of size at least aC Inn and no more than j3Clnn.) 3.1 Properties of the Content Addressable Net- work Following is the main theorem which we will prove • We also map the n data items to the n/logn in Section 4. bottom supernodes in the butterfly. Every one of THEOREM 3.1. For all e > O, there e~ist constants the n data items is hashed to B random bottom kl(e), k2(e), k3(e) which depend only on e such that supernodes. (Typically, we would not hash the entire data item but only it's title, e.g., "Singing * Every node requires kl(e) logn memory. in the Rain"). 3 * Search .for a data item takes no more than • The data item is stored in all the component nodes ks (~) log n time. of all the (bottom) supernodes to which it has been • Search ]or a data item requires no more than ~We use the model ([4]) for this hash function, k3(e) log 2 n messages. it would have sufficed to have a weaker assumption such a~ that the hash function is expansive. • All but en nodes can reach all but ~n data items. 98

3.2 Some Comments We note though that a new node can simply copy the links to top supernodes of some other node 1. Distributed creation of the content address- already in the network and will thus very likely be able network able to access almost all of the data items. This We note that our Content Addressable Memory can insertion takes O(logn) time. Of course the new be created in a fully distributed fashion with n node will not increase the resiliency of the network broadcasts or transmission of ~2 messages in to- if it inserts itself in this way. We assume that tal and assuming O(log n) memory per node. We a full reorganization of the network is scheduled briefly sketch the protocol that a particular node whenever sufficiently many new nodes have been will follow to do this. The node first randomly added in this way. chooses the supernodes to which it belongs. Let S be the set of supernodes which neighbors supern- . Load Balancing Properties odes to which the node belongs. For each s E S, Because the data items are searched for along a the node chooses a set N, of D random numbers path from a random top supernode to the bottom between 1 and/~C in n. The node then broadcasts a supernodes containing the item, and because these message to all other nodes which contains the iden- bottom supernodes are chosen at random, the load tifiers of the supernodes to which the node belongs. will be well balanced as long as the number of Next, the node will receive messages from all other requests for different data items is itself balanced. nodes giving the supernodes to which they belong. This follows because a uniform distribution on For every s E S, the node will link to the i-th node the search for data items translates to a uniform that belongs to s from which it receives a message distribution on top to bottom paths through the if and only ff ~ E N,. butterfly. If for some supernode to which the node belongs, the node receives less than ~C Inn or greater than 4 Proofs ~C In n messages from other nodes in that supern- 4.1 Proof Overview ode, the node removes all out-going connections as- Technically, the proof makes extensive use of ran- sociated with that supernode. Similarly, if for some dom constructions and the Probabilistic Method [2]. supernode in S, the node receives less than ~C In n We first show that with high probability, all but an or greater than ~C in n messages from other nodes arbitrarily small constant times n/logn of the supern- in that supernode, the node removes all out-going odes are good, where good means that (a) they have connections to that neighboring supernode. Con- O(log n) nodes associated with them, and, (b) they have nections to the top supernodes and storage of data ~(log n) live nodes after adversarial deletion. This im- items can be handled in a similar manner. plies that all but a small constant fraction of the paths through the butterfly contain only good supernodes. 2. Insertion of a New Data Item Search is preformed by broadcasting the search to One can insert a new data item simply by perform- all the nodes in (a constant number of) top supernodes, ing a search, and sending the data item along with followed by a sequence of broadcasts between every the search. The data item will be stored at the successive pair of supernodes along the paths between nodes of the bottommost supernodes in the search. one of these top supernodes and a constant number We remark that such an insertion may fail with of bottom supernodes. Fix one such path. The some small constant probability. broadcast between two successive supernodes along the path makes use of the expander graph connecting these 3. Insertion of a New Node two supernodes. When we broadcast from the live nodes Our network does not have an explicit mechanism in a supernode to the following supernode, the nodes for node insertion. It does seem that one could that we reach may be both live and dead(see Figure 3). insert the node by having the node choose at ran- Assume that we broadcast along a path, all of dom appropriate supernodes and then forging the whose supernodes are good. One problem is that we required random connections with the nodes that are not guaranteed to reach all the live nodes in the belong to neighboring supernodes. The technical next supernode along the path. Instead, we reduce difficulty with proving results about this insertion our requirements to ensure that at every stage, we process is that not all live nodes in these neigh- reach at least 61og~ live nodes, for some constant 6. boring supernodes may be reachable and thus the The crucial observation is that if we broadcast from probability distributions become skewed. 61ogn live nodes in one supernode, we are guaranteed 99

where l' <_ l and r' <_ r and

d> ~ l'ln ff +r'ln ~7 +2Inn . Supernode 8upernode Supernode Let G be a random bipartite multigraph with left side L and right side R where ILl = I and [Rt = r and each Figure 3: Traversal of a path through the butterfly. node in L has edges to d random neighbors in R. Then with probability at least 1 - 1In 2, any subset oiL of size l' shares an edge with any subset of R o/ size r' . to reach at least 61ogn live nodes in the subsequent LEM~A 4.2. Let l,r,Y,r',d,A and n be any positive supernode, with high probability. This follows by values where l' < i, r' < r, 0 < ~ < 1 and using the expansion properties of the bipartite expander connection between two successive supernodes. Recall that the nodes are connected to a constant d>r,l,(~--A) 2 l'ln +r'ln +21nn . number of random top supernodes, and that the data items axe stored in a constant number of random bottom Let G be a random bipartite multigraph with left side L supernodes. The fact that we can broadcast along all and right side R where ILl = i and IRI = r and each but an arbitrarily small fraction of the paths in the node in L has edges to d random neighbors in R. Then butterfly implies that most of the nodes can reach most with probability at least 1 - I/n ~, for any set L' C L of the content. where ]L'] = Y, there is no set R' C R, where IRq = r' In several statements of the lemmata and theorems such that all nodes in R' share less than M'd/r edges with L'. in this section, we require that n, the number of nodes in the network, be sufficiently large to get our result. We LEMMA 4.3. Let l, r, r ~, d, ff and n be any positive val- note that, technically, this requirement is not necessary ues where Y <_ l,/3' > 1 and since if it falls then n is a constant and our claims trivially hold. d _ 4r (r, ln (~e) + 21nu) .

4.2 Definitions Let G be a random bipartite multigraph with left side L DEFINITION 4.1. A top or middle supernode is said to and right side R where ILl = I and IRI = r and each be (c~,t~)-good if it has at most fllogn nodes mapped to node in L has edges to d random neighbors in R. Then it and at least a log n nodes vJhich are not under control with probability at least 1 - 1In 2, there is no set R' C R, of the adversary. where IR'I = r' such that all nodes in R' have degree greater than ff ld/r. DEFINITION 4.2. A bosom supernode is said to be (a,~)-good if it has at most/~logn nodes mapped to 4.4 (a,~)-good Supernodes it and at least alogn nodes which are not under control of the adversary and if there are no more than ~B In n LEMlvlA 4.4. Let a,6',n be values where a < 1/2 and data items that map to the node. 6' > 0 and let k(6', a) be a value that depends only on ~,6' and assume n is sufficiently large. Let each node DEFINITION 4.3. An (a, j3)-good path is a path through participate in k(6',a)Inn random middle supernodes. the butterfly network from a top supernode to a bottom Then removing any set of n/2 nodes still leaves all but super'node all of whose supernodes are (a,~)-good su- pernodes. 6'n/ In n middle supernodes with at least ak(6', ~) In n live nodes. DEFINITION 4.4. A top supernode is called (7,a, fl)- Proof. For simplicity, we will assume there axe n middle expansive if there exist Vn/ logs (a, fl)-good paths that start at this supernode. supernodes (we can throw out any excess supernodes). Letl-n,i'=n/2, r=n,r' =6'n/Inn, A= 2~ and d = k(6', a) In n in Lemma 4.2. We want probability 4.3 Technical Lemmata Following are three tech- less than 1In 2 of being able to remove n[2 nodes nical lemmata about bipartite expanders that we will and having a set of 6'n/In n supernodes all with less use in our proofs. The proof of the first lemma is well than ak(6', ~)Inn live nodes. This happens provided known [21] (see also [17]) and the proof of the next two that the number of connections from each supernode is lemmata are slight variants on the proof of the first. Due bounded as in Lemma 4.2 which happens when: to space constraints, the proofs of all three lemmata axe omitted from this extended abstract. 21n(2e) LEMMA 4.1. Let i, r, l', r', d and n be any positive values k(6',a) > 6,(i---~-t~o~)2 +o(1). 100

LEMMA 4.5. Let ~3,61,n,k be values such that fl > 1, Proof. Let the data items be the left side of a bipartite 61 > 0 and assume n is su~ciently large. Let each node graph and the bottom supernodes be the right side. The participate in k Inn of the middle supernodes, chosen proof is then the same as Lemma 4.7. uniformly at random. Then all but ~'n/Inn middle COROLLARY 4.2. Let 51 > 0, o < 1/2, fl > 1. supernodes have less than ~3kln n participating nodes Let k(5 t, a), be a value depending only on 61 and as- ~ith probability at least 1 - 1/n 2. sume n is su~ciently large. Let each node appear in Proof. For simplicity, we will assume there are n middle k(6 I, 0) top supernodes, k(6 l, 0) bosom supernodes and supernodes (we can throw out any excess supernodes k(~', a) In n middle supernodr~. Then all but 5'n of the and the lemma will still hold). Let I = n, r = n, supernodes are (ok(6 t, a), ~k($', a))-good with probabil- r I = 6tn/lnn, d - kinn and fll = fl in Lemma 4.3. ity 1 - O(1/n2). Then the statement in this lemma holds provided that: Proof. Use 4 (lnn 2 ) kCy, o) = 1_o. 21n(2e) k> (fl-1) inn" In + " 3 61(1 -- 2a) 2 The right hand side of this equation goes to 0 as n in Lemma 4.6. Then we know that no more goes to infinity. than 351n/(10inn) top supernodes and no more LEMMA 4.6. Let 0,5~,n be values such that a < 1/2, than 3aln/(101nn) bottom supernodes have less than 6' > 0 and let k(5',a) be a valu'e that depends only on 6' al~(61,a)hn live nodes. Next plugging k(Sr,a) into and a and assume n is suj~iciently large. Let each node Lemma 4.4 gives that no more than 351n/(101nn) mid- participate in k(6 t, a) top (bottom) supernodes. Then m- dle supernodes have less than ak(5 ~, a)In n live nodes. mo~ing any set of n/2 nodes still leaves all but 6tn/ Inn Next using k(6 !, a) in Lemma 4.7 and Lemma 4.5 top (boSom) supernodes with at least 0k(5', a) in n live gives that no more than 5~n/(20 In n) of the supernodes nodes. can have more than ilk(6 I, a)In n nodes in them. Fi- Proof. Let l ---- n, !t = n/2, r = n~ In n, r' = 5'n/In n, nally, using k(5 t, a) in Lemma 4.1 gives that no more A = 20 and d = k(5 !,0) in Lemma 4.2. We want than 5tn/(2Oinn) of the bottom supernodes can have probability less than 1/n 2 of being able to remove n/2 more than ilk(51, a)In n data items stored at them. If nodes and having a set of 6tn/Inn supernodes all with we put these results together, we get that no more than less than ok(6 I, a) In n live nodes. We get this provided 5n/In n supernodes are not (ok(5', a), ~k(5', a))-good that the number of connections from each supernode is with probability 1 - O(1/n 2) bounded as in Lemma 4.2: 4.5 (7, a, ~)-expan~ive Supernodes 2 ln(2e) k((5', a) ---- 6'(1 -- 20) 2 + o(1). THEOREM 4.1. Let ~ > 0, a < 1/2, 0 < 7 < 1,/~ > 1. Let k(5,a,7) be a value depending only on 6,0,7 and LEMMA 4.7. Let B,61,n,k be values such that fl > 1, assume n is suJ~iciently large. Let each node partic- 61 > 0 and n is su]~cicntly large. Let each node partic- ipate in k(5, 0, 7) top supernodes, k(5, a, 7) bottom su- ipate in k of the top (bottom) supernodes (chosen uni- pernodes and k( 6, a, 7) In n middle supernodes. Then all formly at random). Then all but 5In/In n top (bottom) but 5n/inn top supernodes are (7,ak(6,a),flk(5,a))- supernodes consist of less than .Bkln n nodes with prob- ez'par~ive with probability 1 - O(1/n2). ability at least I- i/n 2. Proof. Assume that for some particular k(6,0,7) Proof. Let I = n, r = n/lnn, r' = 5'n/lnn, d = k that more than 6n/inn top supernodes are not and ff = fl in Lemma 4.3. Then the statement in this (7, ok(5, a, 7),/~k(~, a, 7)-expansive. Then each of these lemma holds provided that: bad top supernodes has (1 - 7n)/In n paths that are not (ok(6, a, 7),/~k(5, a, -),))-good. So the total number b _> In n(/~ - 1) 2 " In ~7 + -~-n-n / " of paths that are not (ak(5,a,7),~k(6,a,7))-good is more than The right hand side of this equation goes to 0 as n goes to infinity. 6(1 - 7)n 2 COROLLARY 4.1. Let ~,6~,n, k be values such that ~ > In s n 1, 5~ > 0 and n is sufficiently large. Let each data We will show there is a k(5,a,7) such that this event item be stored in k of the bottom supernodes (chosen will not occur with high probability. Let 51 = 6(1 - 7) uniformly at random). Then all but 5'n/ In n bottom and let supernodes have less than ~kln n data items stored on 10 21n(2e) them with probability at least 1 - 1In 2. a,7) = 50 - 7)(1 - 20) 2. 101

Then we know by Lemma 4.2 that with high proba- occurs with probability no more than I/n 2 by plugging bility, there are no more than 5(1 - 7)n/lnn supern- in I = n, Il = en into r = n/inn, r' = Try/Inn into odes that are not (ak(5, a, 7),/~k(/~, a, 7))good. We Lemma 4.1. We get also know that each of these supernodes which axe not good cause at most n/In n paths in the butterfly to be k(~,7) >- 1. In e +o(1) not (ak(~, a, 7), ~k($, a, 7))-good. Hence the number of 7 e paths that are not (ak(d~, a, 7), flk(6, a, 7))-good is no 4.7 Connections between (a,/~)-good supers- more than 5(1 - 7)n2/(ln 9 n) which is what we wanted odes to show. LEMMA 4.10. Let c~,fl, a',n be any positive values 4.6 (a,/~)-good Paths to Data items We will where a' < ~, a > 0 and let C be the number of su- use the following lemma to show that almost all the pernodes to which each node connects. Let X and Y be nodes axe connected to some appropriately expansive two supersedes that are both (aC,~C)-good. Let each top supersede. node in X have edges to k(a, fl, a') random nodes in Y where k(a, 1~, a') is a value depending only on a, ~ and LEMMA 4.8. Let ~ > O, • > 0 and n be su1~ciently ~'. Then with probability at least 1 - 1/n 2, any set of large. Then exists a constant k(~, e) depending only on a'C In n nodes in X has at least a' C In n live neighbors • and 6 such that if each node connects to k(6, e) random in Y top supersedes then with high probability, any subset of the top supernodes of size (1 -6)n/In n can be reached Proof. Consider the event where there is some set of by at least (1 - e)n nodes. a'Clnn nodes in X which do not have a'Clnn live neighbors in Y. There are aClnn live nodes in Y Proof. We imagine the n nodes as the left side of a so for this event to happen, there must be some set bipartite graph and the n/Inn top supersedes as the right side and an edge between a node and a top of (a - ~')Clnn live nodes in Y that share no edge supernode in this graph if and only if the node and with some set of ~'dlnn nodes in X. We note that the supernode are connected. probability that there are two such sets which share no For the statement in the lemma to be false, there edge is largest when X and Y have the most possible nodes. Hence we will find a k(a,/~,a') large enough to must be some set of en nodes on the left side of the graph make this bad event occur with probability less than and some set of (1-$)n] In n top supernodes on the right 1In 2 if in Lemma 4.1 we set I = t~Clnn, r = flClnn, side of the graph that share no edge. We can find k(6, e) large enough that this event occurs with probability no !' = a'C In n and r' = (a - a')C In n. When we do this, more than 1/n 9 by plugging in I = n, 1' = en, r = n/In n we get that k(a, ~, a ~) must be greater than or equal to: and r' = (1 - 6)(n/Inn) into Lemma 4.1. The bound found is: ,

In (-;) + o(1). k(~,e) >_ 1--6 4.8 Putting it All Together We axe now ready to We will use the following lemma to show that if give the proof of Theorem 3.1. we can reach 7 bottom supernodes that have some live Proof. Let $,a,7,a~,/~ be any values such that 0 < ~f < nodes in them that we can reach most of the data items. 1, 0 1 and 0 <7< 1. Let LEMMA 4.9. Let 7, n, e be any positive values such that • > O, 7 > O. There exists a k(e,7) which depends = I0 2 In(2¢) only on e, 7 such that if each bottom supernode holds T " 60-.r)Cn-2a) • ; k(e, 7)Inn random data items, then any subset of bot- T = ,n(~). tom supernodes of size 7n/lnn holds (1- e)n unique 1-6 data items. B= lln{~- Proof. We imagine the n data items as the left side of D = a' In a' a bipartite graph and the n/In n bottom supernodes as the right side and an edge between a data item and Let each node connect to C top, C bottom and a bottom supernode in this graph if and only if the C middle supernodes. Then by Theorem 4.1, at least supersede contains the data item. The bad event is (I -5)n/In n top supersedes axe (7, aC,/~C)-expansive. that there is some set ofTn/Inn supernodes on the right Let each node connect to T top supernodes. Then by that share no edge with some set of en data items on the Lemma 4.8, at least (I -•)n nodes can connect to right. We can find k(e, 7) large enough that this event some (7, aC,/~C)-expansive top supernode. Let each 102

data item map to B bottom supernodes. Then by plies that most of the nodes can access most of the con- Lemma 4.9, at least (1 - e)n nodes have (aC,/~C)-good tent through paths that don't contain any "adversary- paths to at least (1 - e)n data items. majority" supernodes. Finally, let each node in a middle supernode have D Search is performed as with the original construc- random connections to nodes in neighboring supernodes tion, and after the bottommost supernodes are reached, in the butterfly network. Then by Lemma 4.10, at the data content flows back along the same links as the least (1 - e)n nodes can broadcast to enough bottom search went down. We modify the protocol so that along supernodes so that they can reach at least (1 - e)n data this return flow, every node passes up a data value only items. if a majority of the values it received from the nodes Each node requires T links to connect to the top below it are the same. This means that if there are supernodes; 2D links for each of the C top supernodes no "adversary-majority" supernodes on the path, then it plays a role in; 2D links for each of the C In n middle all good nodes will take a majority value from a set in supernodes it plays a role in and BE in n storage for which good nodes are a majority. Thus, along such a each of the C bottom supernodes it plays a role in. The path, only the correct data value will be passed upwards total amount of memory required is thus by good nodes. At the top, the node that issued the search takes the majority value amongst the (O(log n)) T + 2DC + Cln n(2D + B/3), values it receives as the final search result. To summarize, the main theorem for spare resistant which is less than kl(e)logn for some kl(e) dependent content addressable networks is as follows: only on e. THEOREM 5.1. For any constant c < I/2 such that the Our search algorithm will find paths to at most 13 adversary controls no more than en nodes, and for all bottom supernodes for a given data item and each of e > O, there exist constants kl(e), ~(e), k3(e ) which these paths has less than loan hops in it so the search depend only on e such that time is no more than • Every node requires kl(e) logZn memory. k2(e) logn = Blogn. Search for a data item takes no more than Each supernode contains no more that/~ln n nodes k3(e) logn time. (This is under the assumption that and in each search, exactly T top supernodes send no ne~ork latency overwhelms processing time for one more than B messages so the total number of messages message, otherwise the time is 0(log 2 u).) transmitted during a search is no more than Search for a data item requires no more than k3(e) log 2 n = (TB~C) log 2 n. ks(e) log ~ n messages.

5 Modifications for Spare Resistant Content • All but en nodes can search successfully for all but Addressable Network en of the true data items. We only sketch the changes in the network and the proofs to allow a spam resistant content addressable net- 6 Discussion and Open Problems work. The arguments are based on slight modifications We conclude with some open issues: to the proofs of section 4. 1. For the deletion resistant content addressable net- The first modification is that rather than have a work: Is there a mechanism for dynamically main- constant degree expander between two supernodes con- taining our network when large numbers of nodes nected in the butterfly, we will have a full bipartite are deleted or added to the network? Is it possible graph between the nodes of these two supeznodes. Since to reduce the number of messages that are sent in a we've insisted that the total number of adversary con- search for a data item from O(log 2 n) to O(logn)? trolled nodes be strictly less than n/2, we can guarantee that a 1 - e fraction of the paths in the butterfly have 2. Can one improve on the construction for the spam all supernodes with a majority of good (non-adversary resistant content addressable network? controlled) nodes. In particular, by substituting ap- propriate values in Lemma 4.2 and Lemma 4.3 we can 3. Can one deal efficiently with more general Byzan- guarantee that all but en/logn of the supernodes have tine faults? For example, the adversary could use a majority of good nodes. This then implies that no nodes under his control to flood the network with more than an e fraction of the paths pass through such irrelevant searches, this is not dealt with by either "adversary-majority" supernodes. As before, this ira- of our solutions. 103

4. We conjecture that our network has the property guages and Operating Systems (ASPLOS ~000), 2000. that it is poly-log competitive with any fixed de- [12] Thomson Leighton, Bruce Maggs, and Ramesh Sitama- gree network. Le., we conjecture that given any ran. On the fault tolerance of some popular bounded- fixed degree network topology, where n items are degree networks. SIAM Journal on Computing, 1998. distributed amongst n nodes, and any set of access [13] Da.hlla Malkhi, Michael Reiter, and Avish~i Wool. The requests that can be dealt with fixed sized buffers, load and availability of byzantine quorum systems. SIAM Journal of Computing, 29(6):1889-1906, 2000. then our network will also deal with the same set [14] Dahlia Malkhi, Michael Reiter, Avishai Wool, and of requests by introducing no more than a polylog Rebecca N. Wright. Probabilistie byzantine quorum slowdown. systems. In Symposium on Principles of Distributed Computing, 1998. 7 Acknowledgments [15] Ariel D. Rubin Marc Waldman and Lorrie Faith Cra- We gratefully thank , Prabhakar tL~ghavan, nor. Publius: A robust, taxaper-evident, censorship- Stefan Saroiu, and Steven Gribble for their great help resistant, web publishing system. In Proc. 9th USENIX with this paper. Security Symposium, pages 59-72, August 2000. [16] Robert Maxquand. China's web users kept on References thek toes. http://www.csmonitor.com/durable/- 2000/12/07/fpTsl-csm.shtml. [1] , Halm I~plan, Michael Krivelevich, Dahlia [17] l~jeev Motwani and Prabhakar Raghavan. Random- Malkhi, and Julien Stern. Scalable secure storage when ized Algorithms. Cambridge University Press, 1995. half the system is faulty. In Proceedings of the ~Tth [18] Index on Censorship. Index on censorship homepage. International Colloquium on Automata, Languages and http://www.indexoncensorship.org. Programming, 2000. [19] Andy Oram, editor. Peer-to-Peer: Harnessing the [2] Noga Alon and Joel Spencer. The Probabihstic Method, Power of Disruptive Technologies. O'Reilly & Asso- ~nd Edition. John Wiley & Sons, 2000. ciates, July 2001. [3] Yonatan Aumann and Michael Bender. Fault tolerant I20] Gopal Pandurangan, PrabhaXar P~ghavan, and Eli data structures. In IEEE Symposium on Foundations Upfal. Building low-diameter p2p networks. In STOC of Computer Science, 1996. ~001, Crete, Greece, 2001. [4] M. Bellare and P. Rogaway. Random oracles are prac- (21] Iv[. Pinsker. On the eomple~dty of a concentrator. In tical: a paradigm for designing efficient protocols. In 7th International Teletra~c Conference, 1973. The First ACM Conference on Computer and Commu- [22] C. Plaxton, 1%. Rajaram, and A.W. Richa. Accessing nications Security, pages 62-73, 1993. nearby copies of replicated objects in a distributed [5] John Borland. Gnutella girds against spare environment. In Proceedings of the Ninth Annual A CM attacks. CNET News.corn, August 2000. Symposium on Parallel Algorithms and Architectures http: / /news.cnet.com /news /O- l OO5-200- 2489605.html. (SPAA), 1997. [6] Clip2. Gnntella: To the bandwidth barrier and be- [23] Sylvia Ratnasaany, Paul Francis, Mark Handley, yond. http://dss.clip 2.corn/guutella.html. Richaxd Karp, and . A Scalable Content- [7] Electronic Freedom Foundation. Eft -- censor- Addressable Network. In Proceedings of the A CM SIG- ship -- internet censorship legislation & regula- COMM ~001 Technical Conference, San Diego, CA, tion (cda, etc.)- archive, http://www.eff.org/- USA, August 2001. pub/Censorship/internet _censorship_bills. [24] Stefan Saroiu, P. Krishna Gummadi, and Steres D. [8] D.K. Gifford. Weighted voting for replicated data. In Gribble. A Measurement Study of Peer-to-Peer File Proc. of the Seventh ACM Symposium on Operating Sharing Systems. In Proceedings of Multimedia Com- Systems Principles, pages 150-159, 1979. puting and NetnJorking, 2002. [9] J. Hastad, T. Leighton, and M. Newman. Fast compu- [25] . How to share a secret. Communications tation using laulty hypercubes. In Proceedings of the of the ACM, ~2,pp. 61~-613, 1979. 21st Annual ACM Symposium on Theory of Comput- [26] Ion Stoica, Robert Morris, David Karger, Frans ing, 1989. Kaashoek, and Hart BalaXrishnan. Chord: A Scalable [10] Anna It. Karlin, Greg Nelson, and Hisao Tamaki. On Peer-to-peer Lookup Service for Intemet Applications. the fault tolerance of the butterfly. In ACMSymposium In Proceedings of the A CM SIGCOMM 2001 Technical on Theory of Computing, 1994. Conference, San Diego, CA, USA, August 2001. [11] John Kubiatowicz, David Bindel, Yan Chen, Steven [27] Gnutella website, http://gnutella.wego.com/. Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna [28] Napster wehsite, http://www.napster.com/. Gummadi, Seas Rhea, Hakim Weatherspoon, Westley [29] B.Y. Zhao, K.D. Kubiatowicz, and A.D. Joseph. Weimer, Chris Wells, and Ben Zhao. Oceanstore: An Tapestry: An Infrastructure for Fault-Resilient Wide- architecture for giobal-scale persistent storage. ]n Ap- Area Location and Routing. Technical Report pears in Proceedings of the Ninth international Confer- UCB//CSD-01-1141, University of California at Berke- ence on Architectural Support ]or Programming Lan- ley Technical Report, April 2001.