Censorship Resistant Peer-To-Peer Content Addressable Networks
Total Page:16
File Type:pdf, Size:1020Kb
Censorship Resistant Peer-to-Peer Content Addressable Networks Amos Fiat* Jared Saiat Thomas Hobbes, Rene Descartes, t;~ancis Bacon, points in the system. For example, the Napster ([28]) Benedict Spinoza, John Milton, John Locke, Daniel file sharing system has been effectively dismembered by Defoe, David Hume, Jean-Jacques Rousseau, Blaise legal attacks on the central server. Additionally, the Pascal, lmmanual Kant, Giovanni Co~anova, John Stuart Mill, Emile Zola, Jean-Paul Sartre, Victor Gnutella ([27]) file sharing system, while specifically Hugo, Honore de Balzac, A. Dumas pete, A. Dumas designed to avoid the vulnerability of a central server, ill, Gus~a~e Flaubert, Rabelais, Montaigne, La is highly vulnerable to attack by removing a very small Fontaine, Voltaire, Denis Diderot, Pierre Larousse, number of carefully chosen nodes ([24]). Anatole France A significant performance problem with Gnutella Partial List o] authors in Index Librorum Pro- is that it requires up to O(n) messages for a search hibitorum (Index o] Prohibited Books) from the (search is performed via a general broadcast), this has Roman O~ice of the In~luisition , 1559-1966. effectively limited the size of Gnntella networks to about 1000 nodes ([6]). Abstract A more principled approach than the centralized ap- proach taken by Napster or the broadcast search mecha- We present a censorship resistant peer-to-peer Content nism of Gnutella is the use of a content addressable net- Addressable Network for accessing n data items in a work ([23]). A content addressable network is defined network of n nodes. Each search for a data item in as a distributed, scalable, indexing scheme for peer-to- the network takes O(logn) time and requires at most peer networks. Plaxton, Rajaram and Richa ([22]) give O(log 2 n) messages. Our network is censorship resistant a scheme to implement a context addressable network in the sense that even after adversarial removal of an (prior to its definition) in a web cache environment. The arbitrarily large constant fraction of the nodes in the content addressable network scheme of ([22]), as mod- network, all but an arbitrarily small fraction of the ified in ([29]), has been used in implementations such remaining nodes can obtain all but an arbitrarily small as Oceanstore ([11]). Subsequent content addressable fraction of the original data items. The network can be networks have been suggested in ([23, 26, 29]). These created in a fully distributed fashion. It requires only papers seek to improve upon ([22]) in adding fault resis- O(log n) memory in each node. We also give a variant of tance and load balancing mechanisms. However, while our scheme that has the property that it is highly spare these schemes are robust to a large number of random resistant: an adversary can take over complete control faults, none of these schemes are robust against adver- of a constant fraction of the nodes in the network and sarial faults. yet will still be unable to generate spare. 1.1 Resistance to Adversarial Node Deletion 1 Introduction We present a content addressable network with n nodes Web content is under attack by state and corporate used to store n distinct data items 1. As far as we know efforts to censor it, for political and commercial reasons this is the first such scheme of its kind. The scheme is ([7, 18, 16]). robust to adversarial deletion of up to 2 half of the nodes Peer-to-peer networks are considered more robust in the network and has the following properties: against censorship than standard web servers ([19]). However, while it is true that many suggested peer- to-peer architectures are fairly robust against random ]For simplicity, we've assumed that the number of items and the number of nodes is equal. However, for any n nodes and faults, the censors can attack carefully chosen weak m ~ n data items, our scheme will work~ where the search time remains O(log n), the number of messages remulna O(log 2 n), and -*-]~paxtment of Computer Science, Tel Avlv University, work the storage requirements are O(logn x re~n) per node. done while on Sabbatical at University of Washington, Seattle. 2For simplicity, we give the proofs with this constant equal to emaih fiat~math.tau.ac.il 1/2. Howeverwe can easily modify the scheme to work for any f Department of Computer Science, University of Washington, constant less than 1. This would change the constants involved Seattle. ema~l: saia~}cs.washington.edu. in storage, search time, and messages sent, by a constant factor. 95 1. With high probability, all but an arbitrarily small theorem, Theorem 3.1, is given in Section 4. In Section fraction of the nodes can find all but an arbitrarily 5 we sketch the modifications required in the algorithms small fraction of the data items. and the proofs to obtain spam resistance, the main the- orem with regard to spare resistant content addressable 2. Search takes (parallel) time O(log n). networks is Theorem 5.1. We conclude and give direc- tions for future work in Section 6. Acknowledgements 3. Search requires O(log 2 n) messages in total. are in section 7. 4. Every node requires O(logn) storage. 2 Related Work As stated above, in the context of state or corporate 2.1 Peer-to-peer Networks (not Content Ad- attempts at censorship, it seems reasonable to consider dressable) Peer-to-peer networks are a relatively re- adversar/a/attacks rather than random deletion. Our cent and quickly growing phenomena. The average scheme is a content addressable network that is robust number of Gnutella users in any given day is no less than against adversarial deletion. 10,000 and may range as high as 30,000 ([6]). Napster We remark that such a network is clearly resilient software has been downloaded by 50 million users ([231). to having up to 1/2 of the nodes removed at random, (in Pandurangam, Raghavan, and Upfal ([20l) address actuality, its random removal resiliency is much better). the problem of maintaining a connected network under We further remark that if nodes come up and down over a probabilistic model of node arrival and departure. time, our network will continue to operate as required They do not deal with the question of searching within so long as at least n/2 of the nodes are alive. the network. They give a protocol which maintains a network on n nodes with diameter O(logn). The 1.2 Spare Resistance Another problem with peer- protocol requires constant memory per node and a to-peer networks has been that of spamming ([5, 6]). central hub with constant memory with which all nodes Because the data items reside in the nodes of the can connect. network, and pass through nodes while in transit, it Experimental measurements of a connected compo- is possible for nodes to invent alternative data and pass nent of the real Gnutella network have been studied it on as though it was the sought after data item. ([24]), and it has been found to stillcontain a large con- We now describe a spare resistant variant of our nected component even with a 1/3 fraction of random content addressable network. To the best of our knowl- node deletions. edge this is the first such scheme of its kind. As before, assume n nodes used to store n distinct data items. 2.2 Content Addressable Networks -- Random The adversary may choose up to some constant c < 1/2 Faults There are several papers that address the prob- fraction of the nodes in the network. These nodes under lem of creating a content addressable network. As men- adversary control may be deleted, or they may collude tioned above, Plaxton, Rajaram and Richa ([22]) give and transmit arbitrary false versions of the data item, a context addressable network for web caching. Search nonetheless: time and the total number of messages is {)(log n), and 1. With high probability, all but an arbitrarily small storage requirements are O(logn) per node. fraction of the nodes will be able to obtain all but Tapestry ([29]) is an extension to the ([22]) mecha- an arbitrarily small fraction of the true data items. nism, designed to be robust against faults. It is used in To clarify this point, the search will not result in the Oceanstore ([11]) system. Experimental evidence is multiple items, one of which is the correct item. supplied that Tapestry is robust against random faults. The search will result in one unequivocal true item. lq~tnasamy et. M. ([23]) describe a system called CAN which has the topology of a d-dimensional torus. 2. Search takes (parallel) time 0(log n). As a function of d, storage requirements axe O(d) per node, whereas search time and the total number of 3. Search requires O(log 3 n) messages in total. messages is O(dnl/d). There is experimental evidence that CAN is robust to random faults. 4. Every node requires O(log 2 n) storage. Finally, Stoica et. a]. introduce yet another content The rest of our paper is structured as follows. We addressable network, Chord ([26]), which, like ([20]) and review related work in Section 2. We give the algorithm ([29]), requires O(log n) memory per node and O(log n) for creation of our robust content addressable network, search time. Chord is provably robust to a constant the search mechanism, and properties of the content ad- fraction of random node failures. dressable network in Section 3. The proof of our main 96 2.3 Faults on Networks storage per node and ~/(n) search time per query.