Protecting Free Expression Online Peer-to-Peer Networking Peer-to-Peer with Freenet uses a decentralized P2P architecture to create an uncensorable and secure global information storage system.

Ian Clarke he growth of and ero- version of the software is currently and Scott G. Miller sion of privacy on the available under open source at http:// Uprizer Tincreasingly threatens freedom of www.freenetproject.org/. expression in the digital age. Personal In simulations of up to 200,000 nodes, Theodore W.Hong information flows are becoming subject Freenet has proved scalable and fault tol- Imperial College of Science, to pervasive monitoring and surveillance, erant. It operates as a self-organizing P2P Technology, and Medicine and various state and corporate actors are network that pools unused disk space trying to block access to controversial across potentially hundreds of thousands Oskar Sandberg information and even destroy certain of desktop computers to create a collab- and Brandon Wiley materials altogether. Recent incidents orative virtual file system. To increase Freenet Project Inc. such as the publication of Monica Lewin- network robustness and eliminate single sky’s deleted personal e-mails in a U.S. points of failure, Freenet employs a com- congressional report further point to an pletely decentralized architecture. Given unprecedented level of intrusion into pri- that the P2P environment is inherently vate life.1 These trends cause concern not untrustworthy and unreliable, we must only to and political dis- assume that participants could operate sidents, but to anyone disturbed by the maliciously or fail without warning at thought of others reading their e-mail or any time. Therefore, Freenet implements following their Web activities. strategies to protect data integrity and Fortunately, concurrent advances in prevent privacy leaks in the former the power of personal computers have instance, and provide for graceful degra- made it possible to develop peer-to-peer dation and redundant data availability in technologies to respond to these chal- the latter. The system is also designed to lenges. Our project, Freenet, is a distrib- adapt to usage patterns, automatically uted information storage system replicating and deleting files to make the designed to address information priva- most effective use of available storage in cy and survivability concerns.2 A beta response to demand.

40 JANUARY • FEBRUARY 2002 http://computer.org/internet/ 1089-7801/02/$17.00 ©2002 IEEE IEEE INTERNET COMPUTING Freenet

Design Motivation Maintaining privacy for creating and retrieving As documented by Human Rights Watch files means little without also protecting the files (http://www.hrw.org/advocacy/internet/) and the themselves — in particular, keeping their holders Global Internet Liberty Campaign (http://www. hidden from attack. We have thus made it hard to gilc.org/), governments around the world have discover exactly which computers store which undertaken efforts to force Internet service files. Together with redundant replication of data, providers to block access to content deemed holder privacy makes it extremely difficult for unsuitable or subversive, or to make them liable censors to block or destroy files on the network. for such material hosted on their servers. The Elec- Freenet does not, however, explicitly try to tronic Privacy Information Center (http://www. guarantee permanent data storage. Because disk epic.org/) has also raised privacy and civil liber- space is finite, a tradeoff exists between publish- ties questions about developments like the Feder- ing new documents and preserving old ones. Many al Bureau of Investigation’s Carnivore electronic systems solve this problem by requiring payment monitoring system and the European Union’s new (in disk space or money, for example), but we “Convention on Cybercrime,” which gives author- would rather encourage publishing than keep out ities broad powers to intercept and record digital authors who can’t run peer nodes themselves or communications. are too poor to pay for storage. To keep junk doc- Though seemingly separate, the prevention of uments from filling all available space or overwrit censorship and the maintenance of privacy are both ing existing data, we fundamental to free expression in a potentially hos- implement a proba- tile world. Preserving the availability of controver- bilistic storage policy. We must assume that sial information is only half the problem; individu- We hope, however, that als can often be subject to adverse personal Freenet will attract suf- participants could consequences for writing or reading such informa- ficient resources from tion and might need to conceal their activity in order participants to preserve operate maliciously to protect themselves. Indeed, the U.S. Supreme most files indefinitely. Court, among others, has long recognized the impor- or fail without warning. tant role of anonymous speech in political dissent. Freenet A common objection to mechanisms for secure Architecture communication is that criminals might use them Freenet participants each run a that provides to evade law enforcement. Freenet is not particu- the network some storage space. To add a new file, larly attractive for such purposes, as it is designed a user sends the network an insert message con- to broadcast content to the world — not so useful taining the file and its assigned location-indepen- for secret criminal plots. In any case, however, dent globally unique identifier (GUID), which anonymous electronic communication is simply a causes the file to be stored on some set of nodes. tool, like payphones or postal mail, to be used for During a file’s lifetime, it might migrate to or be good or bad. A terrorist might use it to plan an replicated on other nodes. To retrieve a file, a user attack, or an informant could use it to turn the ter- sends out a request message containing the GUID rorist in to the authorities. Most importantly, the key. When the request reaches one of the nodes freedom to communicate is a fundamental value where the file is stored, that node passes the data in a democratic society. There is no way to deny it back to the request’s originator. to the “bad guys” without also denying freedom to the “good guys” — civil rights activists, minority GUID Keys religious groups, or ordinary citizens who simply Freenet GUID keys are calculated using SHA-1 wish to keep their affairs private. secure hashes. The network employs two main In designing Freenet, we focused on types of keys: content-hash keys, used for prima- ry data storage, and signed-subspace keys, intend- privacy for information producers, consumers, ed for higher-level human use. The two are anal- and holders; ogous to inodes and filenames in a conventional resistance to information censorship; file system. high availability and reliability through decen- tralization; and Content-hash keys. The content-hash key (CHK) is efficient, scalable, and adaptive storage and the low-level data-storage key and is generated by routing. hashing the contents of the file to be stored. This

IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 41 Peer-to-Peer Networking

Related Work in P2P

The best-known systems similar to Freenet such a service.Free Haven is an Eternity-like tage to these systems is that they can pro- are (http://www.napster.com/) and anonymous P2P publication system that uses vide strong guarantees that data will be (http://gnutella.wego.com/), which trust mechanisms and file trading to enforce located within certain time bounds (gener- both implement large-scale pooling of disk server accountability and user .2 ally logarithmic) if it exists.Thus, they can space among individual users.The major Unfortunately,it can take a very long time — provide better handling of issues like stor- difference is that whereas Freenet provides even days — to retrieve files from it. age management. a file-storage service, these systems pro- The main disadvantage of these systems vide a file-sharing service.That is, partici- Security Issues relative to Freenet is that they are more dif- pants make files available to others but do Several recently developed P2P file-storage ficult to secure against attack. It is easier for not push files to other nodes for storage. systems focus on efficient data location a malicious node to manipulate its identity This architecture means that data is not rather than privacy and security against to gain responsibility for a particular piece persistent in the network; rather, files are malicious participants. Systems such as of data and suppress it. Links and routing available only when their originators (or OceanStore,3 Cooperative File System are also more visible and deterministically subsequent requesters) are online.Anoth- (CFS),4 and PAST5 are all based on routing structured, making it easier to trace mes- er difference is that neither system models in which each node is assigned a sages and harder to route around malicious attempts to provide anonymity.Gnutella is fixed identity and maintains some knowl- nodes that sabotage requests (for example, also extremely inefficient, broadcasting edge of nodes whose identities vary in by pretending data could not be found). thousands of messages per request. specified ways from its own.These systems PAST,as currently constituted, also requires Freenet more closely resembles the Eter- deterministically place data on nodes that users to trust external smart cards. nity service, which was described in a pro- most closely match the data’s globally posal for a highly survivable network for per- unique identifier (GUID).A user can thus Privacy Issues manently and anonymously archiving locate data by progressively visiting nodes Systems focusing on privacy for informa- information.1 However,the proposal lacked whose identities match more and more tion consumers include browser proxy ser- specifics on how to efficiently implement bits of the desired GUID.The main advan- continued on p. 43

process gives every file a unique absolute identi- “keyring”) and the descriptive string, from which fier (SHA-1 collisions are considered nearly impos- you can recreate the SSK. Adding or updating a sible) that can be verified quickly. Unlike with file, on the other hand, requires the private key in URLs, you can be certain that a CHK reference will order to generate a valid signature. SSKs thus point to the exact file intended. CHKs also permit facilitate trust by guaranteeing that the same pseu- identical copies of a file inserted by different peo- donymous person created all files in the subspace, ple to be automatically coalesced because every even though the subspace is not tied to a real- user will calculate the same key for the file. world identity. For example, you can use SSKs to send out a newsletter, to publish a Web site, or Signed-subspace keys. The signed-subspace key (operated in reverse) to receive e-mail. (SSK) sets up a personal namespace that anyone Typically, SSKs are used to store indirect files can read but only its owner can write to. You could containing pointers to CHKs rather than to store create a subspace for an archive on the Vietnam data files directly. Indirect files combine the human War, for example, by first generating a random readability and publisher authentication of SSKs public-private key pair to identify it. To add a file with the fast verification of CHKs. They also allow you first choose a short text description, such as data to be updated while preserving referential politics/us/pentagon-papers. You would then integrity. To perform an update, the data’s owner calculate the file’s SSK by hashing the public half first inserts a new version of the data, which will of the subspace key and the descriptive string get a new CHK because the file contents are dif- independently before concatenating them and ferent. The owner then updates the SSK to point to hashing again. Signing the file with the private the new version. The new version will be available half of the key provides an integrity check as every by the original SSK, and the old version will node that handles a signed-subspace file verifies remain accessible by the old CHK. Indirect files can its signature before accepting it. also be used to split large files into multiple pieces To retrieve a file from a subspace, you need only by inserting each part under a separate CHK and the subspace’s public key (perhaps stored on your creating an indirect file that points to all the parts.

42 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING Freenet

Related Work in P2P (cont.)

continued from p. 42 tacted. Neither system protects informa- 3. S. Rhea et al., “Maintenance-Free Global Data vices such as the Anonymizer (http://www. tion producers or provides redundant Storage,” IEEE Internet Computing,vol.5,no.5, anonymizer.com/) and SafeWeb/Triangle information storage. Publius8 enhances Sep./Oct. 2001, pp. 40-49. Boy (http://www.safeweb.com/). Both pro- robustness and protects producer an- 4. F.Dabek et al.,“Wide-Area Cooperative Storage vide anonymity by proxying requests for onymity by distributing files as redundant with CFS,” Proc. 18th ACM Symp. Operating System Web content on the user’s behalf,although partial shares among many holders; how- Principles (SOSP 2001),ACM Press, New York, users are vulnerable to logging by the ser- ever,because the identity of the holders is 2001. vices themselves. Crowds6 improves an- not anonymized, an adversary could still 5. A. Rowstron and P. Druschel,“Storage Manage- onymity over simple proxying through a destroy information by attacking a sufficient ment and Caching in PAST,a Large-Scale, Persis- request-chaining technique similar to the number of shares. None of these systems tent Peer-to-Peer Storage Utility,” Proc. 18th ACM one we use. None of these systems direct- protects information consumers, although Symp. Operating System Principles (SOSP 2001), ly stores information; they only provide Rewebber also operates a separate brows- ACM Press, New York,2001. anonymized access to information available er proxy service. 6. M.K. Reiter and A.D. Rubin, “Anonymous Web on the Web. Transactions with Crowds,” Comm.ACM, vol. 42, On the producer-holder side, the References no. 2, 1999, pp. 32-38. Rewebber (http://www.rewebber.de/) pro- 1. R.J.Anderson,“The Eternity Service,” Proc. 1st Int’l 7. I. Goldberg and D.Wagner,“TAZ Servers and the vides some privacy for information holders Conf.Theory and Applications of Cryptology, CTU Rewebber Network: Enabling Anonymous Publish- with an encrypted URL service that is the Publishing House, Prague, Czech Republic,1996, ing on the World Wide Web,” First Monday, vol. 3, inverse of a browser proxy, but is similarly pp. 242-252. no. 4, 1998; available at http://www.firstmonday.dk/ vulnerable to logging by the service oper- 2. R. Dingledine, M.J. Freedman, and D. Molnar, “The issues/issue3_4/goldberg/. ator.TAZ (temporary anonymous zone) : Distributed Anonymous Stor- 8. M.Waldman, A.D.Rubin,and L.F.Cranor,“Publius: A servers7 extend this idea with chains of age Service,” Designing Privacy Enhancing Technologies, Robust,Tamper-Evident,Censorship-Resistant,Web nested encrypted URLs that point to suc- Lecture Notes in Computer Science 2009, Springer- Publishing System,” Proc. 9th Usenix Security Symp., cessive Rewebber-like servers to be con- Verlag, Berlin, H. Federrath, ed., 2001, pp. 67-95. Usenix Assoc., Berkeley,Calif., 2000, pp. 59-72.

Finally, you can use indirect files to create hierar- encrypted, until the message finally reaches its chical namespaces from directory files that point recipient. to other files and directories. Because each node in the chain knows only SSKs can also be used to implement an alterna- about its immediate neighbors, the end points tive system for nodes that change could be anywhere among the network’s hundreds address frequently. Each such node would have its of thousands of nodes, which are continually own subspace, and you could contact it by look- exchanging indecipherable messages. Not even ing up its public key — its address-resolution key the node immediately after the sender can tell — to retrieve the current address. whether its predecessor was the message’s origi- nator or was merely forwarding a message from Messaging and Privacy another node. Similarly, the node immediately Freenet was designed from the beginning under the before the receiver can’t tell whether its successor assumption of hostile attack from both inside and is the true recipient or will continue to forward it. out. Therefore, it intentionally makes it difficult for This arrangement is intended to protect not only nodes to direct data toward themselves and keeps information producers and consumers (at the its routing topology dynamic and concealed. beginning of chains), but also information holders Unfortunately, these considerations have had the (at the end of chains). By protecting the latter, we side effect of hampering changes that might can prevent an adversary from destroying a file improve Freenet’s routing characteristics. To date, by attacking all of its holders. Of course, ensuring we have not discovered a way to guarantee better privacy is not enough; queries must be able to data locatability without compromising security. locate data as well. Privacy in Freenet is maintained using a varia- tion of Chaum’s mix-net scheme for anonymous Routing communication.3 Rather than move directly from Routing queries to data is the most important sender to recipient, messages travel through node- element of the Freenet system. The simplest rout- to-node chains, in which each link is individually ing method, used by services like Napster, is to

IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 43 Peer-to-Peer Networking

c = Data request cessful, each node in the chain passes the file back 2 3 = Data reply Requester upstream and creates a new entry in its routing a 1 12 b = Request failed table associating the data holder with the request- ed key. Depending on its distance from the holder, d Data holder each node might also cache a copy locally. 4 7 11 To conceal the identity of the data holder, nodes 6 10 will occasionally alter reply messages, setting the 9 holder tags to point to themselves before passing 5 e them back up the chain. Later requests will still 8 f locate the data because the node retains the true Figure 1.Typical request sequence.The request moves through the data holder’s identity in its own routing table and network from node to node, backing out of a dead-end (step 3) and forwards queries to the correct holder. Routing a loop (step 7) before locating the desired file. tables are never revealed to other nodes. To limit resource usage, the requester gives each query a time-to-live limit that is decremented at maintain a central index of files, so that users each node. If the TTL expires, the query fails, can send requests directly to information hold- although the user can try again with a higher TTL ers. Unfortunately, centralization creates a sin- (up to some maximum). Because the TTL can give gle point of failure that is easy to attack. For clues about where in the chain the requester is, example, if you were trying to phone Michael Freenet offers the option of enhancing security by Jordan, the simplest way to get his number adding an initial mix-net route before normal would ordinarily be to call directory assistance. routing. This effectively repositions the start of the However, because directory assistance is central- chain away from the requester. ized, your access can be easily blocked if Jordan If a node sends a query to a recipient that is or someone else decides to remove his directory already in the chain, the message is bounced back entry, or if the service goes down. and the node tries to use the next-closest key Systems like Gnutella broadcast queries to every instead. If a node runs out of candidates to try, it connected node within some radius. Using this reports failure back to its predecessor in the chain, method, you would ask all of your friends if any which then tries its second choice, and so on. of them knew Jordan’s number, get them to ask Figure 1 depicts a typical request sequence. The their friends, and so on. Within a few steps, thou- user initiates a request at node A and forwards the sands of people could be looking for his number. request to B, which forwards it to C. Node C is Although this process would eventually find your unable to contact any other nodes and returns a answer, it is clearly wasteful and unscalable. “request failed” message to B. Node B then tries Freenet avoids both problems by using a its second choice, E, which forwards the request steepest-ascent hill-climbing search: Each node to F. Node F forwards the request to B, which forwards queries to the node that it thinks is detects a loop and bounces the message back. closest to the target. You might start searching Unable to contact any additional nodes, node F for Jordan by asking a friend who once played backtracks one step to E, which forwards the college basketball, for example, who might pass request to its second choice, D, and locates the your request on to a former coach, who could file. D returns the file via E and B back to A, pass it to a talent scout, who might pass it to Jor- which sends it to the user. Along the way, E, B, dan’s agent, who could put you in touch with the and A might also cache the file. man himself. With this approach, the request homes in closer with each hop until the key is found. A subsequent Requesting files. Every node maintains a routing query for this key will tend to approach the first table that lists the addresses of other nodes and the request’s path, and a locally cached copy can sat- GUID keys it thinks they hold. When a node isfy the query after the two paths converge. Sub- receives a query, it first checks its own store, and if sequent queries for similar keys will also jump over it finds the file, returns it with a tag identifying intermediate nodes to one that has previously sup- itself as the data holder. Otherwise, the node for- plied similar data. Nodes that reliably answer wards the request to the node in its table with the queries will be added to more routing tables, and closest key to the one requested. That node then hence, will be contacted more often than nodes checks its store, and so on. If the request is suc- that do not.

44 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING Freenet

Inserting files. An insert message follows the Network Evolution same path that a request for the same key would The network evolves over time as new nodes join take, sets the routing table entries in the same and existing nodes create new connections after way, and stores the file on the same nodes. Thus, handling queries. As more requests are handled, new files are placed where queries would look local knowledge about other nodes in the network for them. improves, and routes adapt to become more accu- To insert a file, a user assigns it a GUID key and rate without requiring global directories. sends an insert message to the user’s own node containing the new key with a TTL value that rep- Adding Nodes resents the number of copies to store. Upon receiv- To join the network, a new node first generates a ing an insert, a node checks its data store to see if public-private key pair for itself. This pair serves the key already exists. If so, the insert fails — either to logically identify the node and is used to sign a because the file is already in the network (for physical address reference. Note that public keys CHKs) or the user has already inserted another file are not certified. We don’t need to link them to with the same description (for SSKs). In the latter real-world identities because the node’s public key case, the user should choose a different descrip- is its identity, even if it changes physical address- tion or perform an update rather than an insert. es. Certification might be useful in the future for (Note that we have not yet implemented updates deciding whether to trust a new node, but for now because we are still working on a mechanism to Freenet uses no trust ensure that all old copies get replaced.) mechanism. If the key does not already exist in the node’s Next, the node Nodes’ routing tables data store, the node looks up the closest key and sends an announce- forwards the message to the corresponding node ment message includ- should specialize as it would for a query. If the TTL expires with- ing the public key and out collision, the final node returns an “all clear” physical address to an in handling clusters message. The user then sends the data down the existing node, located path established by the initial insert message. through some out-of- of similar keys. Each node along the path verifies the data band means such as against its GUID, stores it, and creates a routing personal communica- table entry that lists the data holder as the final tion or lists of nodes posted on the Web, with a node in this chain. As with requests, if the insert user-specified TTL. The receiving node notes the encounters a loop or a dead end, it backtracks to new node’s identifying information and forwards the second-nearest key, then the third-nearest, the announcement to another node chosen ran- and so on, until it succeeds. domly from its routing table. The announcement continues to propagate until its TTL runs out. At Data that point, the nodes in the chain collectively For political or legal reasons, node operators assign the new node a random GUID in the key- might wish to remain ignorant of the contents of space using a cryptographic protocol for shared their data stores. To this end, we encourage pub- random number generation that prevents any par- lishers to encrypt all data before insertion. The ticipant from biasing the result. This procedure network proper knows nothing about this level assigns the new node responsibility for a region of of encryption because it just ships already keyspace that all participants agree on while guar- encrypted bits. anteeing that a malicious node cannot influence Data encryption keys are not used in routing or the assignment for a specific key that it might want included in network messages. Inserters distribute to attack. them directly to end users at the same time as the corresponding GUIDs. Thus, node operators can- Training Routes not read their own files, but users can decrypt As more requests are processed, the network’s them after retrieval. Node operators cannot gain routing should become better trained. Nodes’ rout- any information by looking at GUIDs, either, ing tables should specialize in handling clusters of because the hashes used to generate them scramble similar keys because each node will mostly receive any identifying characteristics. From a node oper- requests for keys that are similar to the keys it is ator’s point of view, the data store consists only of associated with in other nodes’ routing tables. random GUIDs attached to opaque data. When those requests succeed, the node learns

IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 45 Peer-to-Peer Networking

about previously unknown nodes that can supply work for relevant keys. This is similar to the prob- such keys and creates new routing entries for lem of searching the Web, and similar solutions are them. As the node gains more experience in han- possible: Freenet can be spidered, or individuals dling queries for those keys, it will successfully can publish lists of bookmarks. However, these answer them more often and, in a positive feed- approaches are not entirely satisfactory in terms back loop, get asked about them more often. of Freenet’s design goals. Nodes’ data stores should also specialize in One simple approach for a true Freenet search storing clusters of files with similar keys. Because would be to create a special public subspace for inserts follow the same paths as requests, similar indirect keyword files. When authors insert files, keys tend to cluster in the nodes along those they could also insert several indirect files corre- paths. Nodes should similarly cluster files cached sponding to search keywords for the original file. after requests because most requests will be for The “Pentagon Papers” file might have indirect files similar keys. named keyword:politics and keyword:united- Taken together, the twin effects of clustering states pointing to it, for example. in routing tables and data stores should improve The system would allow multiple keyword files the effectiveness of future queries in a self-rein- with the same key to coexist (unlike with normal forcing cycle. While we do not yet have a good files), and requests for such keys could return mul- mathematical model to analyze the training and tiple matches. Thus, a search for “politics” might convergence of the return a pointer to the Tiananmen Papers as well Freenet algorithm, as one to the Pentagon Papers. Managing a large Well-known nodes the simulations number of indirect files for common keywords described later show would be difficult, however, because all the files tend to see more that the network with the same name would be attracted to the can, in practice, same nodes. A more sophisticated approach might requests and become locate files quickly use some type of distributed search over detailed — with a median metadata descriptors inserted along with the orig- even better connected. path length of just 8 inal files, but we have not yet devised a way to hops in a 10,000- route such a search efficiently. node network. Managing Storage Key Clustering To encourage participation, Freenet does not Because GUID keys are derived from hashes, the require payment for inserts or impose restrictions closeness of keys in a data store is unrelated to on the amount of data that publishers can insert. the corresponding files’ contents. This lack of Given finite disk space, however, the system must semantic closeness is unimportant, however, sometimes decide which files to keep. It currently because the routing algorithm is based on the prioritizes space allocation by popularity, as mea- locations of particular keys, rather than particu- sured by the frequency of requests per file. Each lar topics. node orders the files in its data store by time of last Suppose, for example, a descriptive string such request, and when a new file arrives that cannot as politics/us/pentagon-papers yields the key fit in the space available, the node deletes the least AF5EC2. Requests for this file could be satisfied by recently requested files until there is room. creating clusters containing the keys AF5EC1, Because routing table entries are smaller, they AF5EC2, and AF5EC3, rather than clusters con- can be kept around longer than files. Evicted files taining works about U.S. politics. In fact, hashes don’t necessarily disappear right away because the are useful because they ensure that similar works node can respond to a later request for the file will be scattered throughout the network, lessen- using its routing table to contact the original data ing the chances that a single node’s failure will holder, which might be able to supply another make an entire category of files unavailable. Sim- copy. Why would the original holder be more like- ilarly, the contents of any given subspace will be ly to have the file? Freenet’s data holder pointers scattered across different nodes, which increases have a treelike structure. Nodes at the leaves robustness. might see only a few local requests for a file, but those higher up the tree receive requests from a Searching larger part of the network, which makes their One open issue is how users can search the net- copies more popular.

46 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING Freenet

0.1 File distribution is therefore determined by two competing forces: tree growth and pruning. The query-routing mechanism automatically cre- ates more copies in an area of the network where a file is requested, and the tree grows in that 0.01 direction. This improves response time and pre- vents overloading when the popularity of a file increases suddenly. Files that go unrequested in

another part of the network are subject to dele- Fraction of nodes 0.001 tion. As that part of the tree shrinks, space is freed up for other files. The net effect is that the number and location of copies adjust to the demand for each file. 0.0001 10 100 1,000 Performance Analysis Number of links We have tested Freenet’s performance using sim- Figure 2. Degree distribution among Freenet nodes.The network ulations. We have described more extended results shows a close fit to a power-law distribution. elsewhere,4 but we will summarize the most impor- tant results here. Freenet demonstrates good scal- ability and fault-tolerance characteristics that can First quartile be explained in terms of a small-world network 100 Median model.5 Small-world networks are characterized by Third quartile a power-law distribution of graph degree (here, the number of routing table entries) of the general form p(x) ∼ x-t, where t is a constant, x is the graph degree, and p(x) is the probability that a node has degree x. In such a distribution, the majority of 10 nodes has relatively few local connections to other

nodes, but a significant small number of nodes Request path length (hops) have large wide-ranging sets of connections. Even in very large networks, the small-world topology enables efficient short paths because these well- connected nodes provide shortcuts. 1 Figure 2 shows the graph degree distribution in 100 1,000 10,000 100,000 1e+06 Network size (nodes) a simulation of a 10,000-node trained network. The distribution closely approximates a power law with Figure 3. Request path length versus network size.The median path t = 1.5, except for an outlier resulting from the max- length in the network scales as N 0.28. imum routing table size (250 in this simulation). This is not surprising, as power-law distributions tend to arise naturally when networks grow by pref- ated files to random nodes in the network, inter- erential attachment (that is, new nodes prefer to spersed with random requests for files that had connect to nodes that already have many links).6 The already been inserted (all with TTL = 20). After new-node announcement protocol initially creates every five inserts and requests, we created a new a preferential attachment effect because following node, which announced itself to a random exist- random links gives a higher probability of arriving ing node with TTL = 10. We measured the net- at nodes that have more links. During normal oper- work’s performance after every hundred inserts ation, the effect continues because well-known and requests by issuing a set of test requests for nodes tend to see more requests and become even previously inserted files and recording the result- better connected (“the rich get richer”). ing path length distribution (the number of hops actually required to find the data). This continued Scalability until the network reached 200,000 nodes. To test Freenet’s scalability, we created a simulat- Figure 3 shows the evolution of the first, sec- ed network of 20 nodes initially connected in a ond, and third quartiles of the request path length ring topology. We sent inserts of randomly gener- versus network size, averaged over 10 trials. We

IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 47 Peer-to-Peer Networking

700 First quartile Figure 4 shows the resulting evolution of the 600 Median Third quartile request path length, averaged over 10 trials, which shows that the network is surprisingly robust 500 against quite large failures. The median path length remained below 20 even when up to 30 per- 400 cent of nodes failed. (Note that requests were capped at 500 hops before giving up.) 300 The power-law distribution gives small-world networks a high degree of fault tolerance6 because 200 random failures are most likely to eliminate nodes Request path length (hops) from the poorly connected majority. Routing per- 100 formance is noticeably affected only after there are enough failures to knock out a significant number of well-connected nodes. A small-world network 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Fraction of nodes failing falls apart much more quickly, however, if the well-connected nodes are targeted first. This is evi- Figure 4. Request path length under random failure. Performance dent in Figure 5, which shows the size of the remained reasonable even up to a 30 percent failure rate in our largest connected component in a 10,000-node simulation. network as nodes were removed, both randomly and in order from most connected to least con- 10,000 nected. Under random failure, the vast majority of Random failure 9,000 the network remained connected until almost the Targeted attack very end. Under targeted attack, the network 8,000 underwent a “percolation transition” near 60 per- 7,000 cent removal, at which point it abruptly broke into disconnected fragments. 6,000 5,000 Future Work 4,000 Initial beta deployment of Freenet is under way, and users have downloaded hundreds of thousands 3,000 of copies of the software so far. The system’s 2,000 anonymous nature makes it impossible to tell Size of largest connected component exactly how many users there are or how well 1,000 inserts and requests are working, but anecdotal evidence is positive. We are working on a simula- 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction of nodes removed tion and visualization suite to enable more rigor- ous tests of the protocol and routing algorithm. Figure 5. Connectivity under random failure and targeted attack.The More realistic simulation and formal modeling are network falls apart quickly when the well-connected nodes are tar- needed to explore the effects of nodes joining and geted first. leaving, variations in node capacity and band- width, and larger network sizes. We still need to develop search mechanisms and can see that the median path length scales sub- provide more protection against denial-of-service linearly with network size as N 0.28, which agrees attacks that flood the system with junk data. with recent results in mathematical modeling of Although the eviction mechanism works to elimi- peer-to-peer networks.7 By extrapolation, it nate files that are never requested, important files appears that Freenet should be capable of scaling could be pushed out if it did not act quickly to one million nodes with a median path length enough under attack. On the other hand, reducing of just 30. the priority of new data could result in files being deleted before they have had a chance to be Fault Tolerance requested. We are exploring various modifications After repeating the previous training procedure to to the caching policy, such as caching less aggres- 10,000 nodes, we progressively removed random sively farther down the data holder pointer tree, to nodes from the network to simulate node failures. balance these considerations.

48 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING Freenet

Acknowledgments Ian Clarke is vice president and chief technology officer of The third author thanks the Marshall Aid Commemoration Uprizer Inc. He received a BS with honors in computer sci- Commission for their support. This material is partly based on ence and artificial intelligence from the University of Edin- work supported under a U.S. National Science Foundation burgh, Scotland. He is the original architect and coordina- graduate research fellowship. of the Freenet Project.

References Theodore W. Hong is a PhD student at Imperial College, London. 1. J. Rosen, The Unwanted Gaze: The Destruction of Privacy His research interests include the dynamics of complex net- in America, Vintage Books, New York, 2001. works, multiagent systems and game theory, and grammat- 2. I. Clarke, et al., “Freenet: A Distributed Anonymous Infor- ical inference for information extraction. He has appeared mation Storage and Retrieval System,” in Designing Pri- as a BBC commentator on digital rights issues. He received vacy Enhancing Technologies, Lecture Notes in Computer an AB in chemistry and physics and mathematics from Har- Science 2009, H. Federrath, ed., Springer-Verlag, Berlin, vard, an MSc in microwaves and optoelectronics from Uni- 2001, pp. 46-66. versity College, London, and was a 1995 Marshall scholar. 3. D.L. Chaum, “Untraceable Electronic Mail, Return Address- es, and Digital ,” Comm. ACM, vol. 24, no. 2, Scott G. Miller is a senior software engineer for Uprizer. He 1981, pp. 84-90. received a BS with honors in computer science from Indi- 4. T. Hong, “Performance,” in Peer-to-Peer: Harnessing the ana University. His research interests include cryptography Power of Disruptive Technologies, A. Oram, ed., O’Reilly and self-organizing systems, which converged in the and Assoc., Sebastopol, Calif., 2001, pp. 203-241. Freenet Project. 5. D. Watts and S. Strogatz, “Collective Dynamics of ‘Small- World’ Networks,” Nature, no. 393, June 1998, pp. 440-442. Oskar Sandberg is a core programmer who has been with 6. R. Albert, H. Jeong, and A. Barabási, “Error and Attack Tol- Freenet since 1999. He is on the board of Freenet Project, erance of Complex Networks,” Nature, no. 406, July 2000, Inc. Sandberg is in the final year of his master’s degree in pp. 378-381. mathematics at the University of Stockholm, Sweden. 7. L.A. Adamic et al., “Search in Power-Law Networks,” Phys- ical Rev. E, vol. 64, no. 4, 2001, article no. 046135. Readers can contact Clarke at [email protected]. Posix SET gigabit Ethernet INDUSTRY enhanced parallel ports STANDARDS wireless token rings networks FireWire

Computer Society members work together to define standards like IEEE 1003, 1394, 802, 1284, and many more.

HELP SHAPE FUTURE TECHNOLOGIES • JOIN A COMPUTER SOCIETY STANDARDS WORKING GROUP AT computer.org/standards/

IEEE INTERNET COMPUTING http://computer.org/internet/ JANUARY • FEBRUARY 2002 49