<<

This project will explore the security issues raised by P2P networks by studying an Analysis of Peer-to- example on an extreme point on the design spectrum: [Gnu]. Our primary Peer Network focus will be the Gnutella network, which is Security using the de facto standard for large, loosely structured, P2P networks. The Grid is a very Gnutella large-scale , heterogeneous, formally structured P2P network that spans many organizations to make “virtual systems” of Dimitri DeFigueiredo±, Antonio resources. [OGSA] It is quickly becoming Garcia+, and Bill Kramer* the standard for distributed resource allocation for high-end computer and ±Department of Computer Science, University of instrumentation systems. This paper California at Davis: [email protected] demonstrates that for P2P networks with ad +Department of Physics, University of California at Berkeley: [email protected] hoc structure, significant security concerns *Department of Computing Sciences, University of persist. California at Berkeley and the National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory: What are Peer-to-peer [email protected] Networks “Peer-to-peer computing is the of computer resources and services by direct exchange between systems. These resources and services include the exchange of Abstract information, processing cycles, cache Peer-to-peer (P2P) networks have emerged storage, and disk storage for files.”[P2PWG] over the past several years as new and A broad definition of P2P includes the effective ways for distributed resources to server mode of computing, as well as communicate and cooperate. "Peer-to- directly amongst clients or computing is the sharing of computer amongst servers. However, P2P now also resources and services by direct exchange is used to describe some new uses of between systems. These resources and computers and networking. In particular, it services include the exchange of is becoming more common for systems to information, processing cycles, cache play both the server and client roles storage, and disk storage for files." P2P simultaneously. P2P networking now is networking has the potential to greatly being used to present new services and expand the usefulness of the network - be it functions. P2P is more than just the for sharing music and video, privately universal file -sharing model popularized by contracting for services or for coordinating . According to the Peer-to-peer the use of expensive scientific instruments working group, business applications for and computers. Some of the networks, such P2P computing fall into a handful of as Napster and Gnutella are created in an ad scenarios. hoc manner with little or no centralized control. Other P2P networks such as · Collaboration: Geographic computational and data grids are being distributed individuals and teams designed and implemented in a very create and manage real-time and structured manner. P2P networks are off-line collaboration areas in a presenting new challenges to computer variety of ways. The goals are security and privacy in a number of ways. typically increased productivity and The focus on decentralization represents a decreased costs. current trend in P2P systems and many see it · Edge services. Edge services move as the stepping-stone to the extended data closer to the point at which it is functionality these systems may provide in actually consumed acting as a the future. This is one of the main reasons network caching mechanism. This this work focuses on the Gnutella network, helps deliver services and Gnutella’s distributed non-homogeneous capabilities more efficiently across architecture make it a suitable test bed to diverse geographic boundaries. A observe how new ideas may affect future current example is Akamai for an P2P networks. However, to put Gnutella’s enterprise architecture into perspective we must be · Distributed computing and aware of what other approaches have been resources. Using networks and taken. computers, P2P technology can use idle CPU power and disk space, allowing businesses to distribute The Grid large computational tasks and data across multiple computers. Results can be shared directly between There are many networks emerging for e- participating peers. Prioritized use commerce and scientific efforts that have a of the resources, even if they are not more formal structure. One excellent idle, is possible. Examples here example of this type of P2P network is what range from the seti@home to the is commonly called “the Grid”. Grid Distributed Teragrid. technology is a collection of tools and · Intelligent agents. Provides ways for services that facilitate the building and computing networks to dynamically managing of “virtual” systems that integrate work together using intelligent distributed, heterogeneous, multi- agents. Agents reside on peer organizational resources on demand. [Grid] computers and communicate various These resources might include the different kinds of information back and forth. computing and data systems operated by a Agents may also initiate tasks on supercomputer center like NERSC, as well behalf of other peer systems. as a diverse collection of user-controlled computing and data systems and scientific Formally and Loosely instruments. The "Grid" is a research effort Structured Peer-to-peer (~10 years in all) whose principle initiators are Ian Foster (ANL) and Carl Kesselman Networks (Cal Tech). The initial implementations centered on Globus and did demonstrations The most commonly known P2P networks (1995-96) of single applications running on are those associated with music sharing. geographically distributed, large parallel First made popular by Napster, a centrally machines– essentially co-scheduling CPUs managed P2P network, and now represented by human agreement and management. The by Gnutella and , these P2P networks concepts and have evolved are designed to be loose structures and dramatically since and has expanded to highly dynamic. Gnutella, Kazaa and others supported large scale data movement, are designed intentionally to have no central collaborative work tools, and much more. control or authority so they are entirely self- There are several "Grids" moving for organizing. The loose federation of experimental to "production" status as other continuing dynamic organization of these projects now use grid tools as reliable networks presents very challenging security infrastructure for their science and issues, especially as the network expands. engineering. Examples of production or that might run on remote resources execute near production grids are the NSF Teragrid, once and only once. Both Globus and NASA Information Power Grid, DOE Condor services provide for communicating Science Grid, Grids for Physics and the with remote jobs, etc. EuroGr id. There is an organization that is like the IETF for setting grid standards A sophisticated set of Grid data services is called the Global Grid Forum. GGF has being developed by the NSF GriPhyN and brought corporate and research communities EU DataGrid projects for managing massive together to work on Grid implementations. data sets in support of the global high energy This effort was given a major critical mass physics community. Over the next few when IBM, ANL and CalTech jointly years these will provide for cataloging, proposed the Open Grid Service querying, accessing, and managing Architecture (OGSA) about 8 months ago. replication, location, and movement of very The joint effort (with about 50 developers) large data sets from a worldwide collection is aimed at integrating the best features of of data sources. Grid portal work at half a Globus (and associated tools) with IBM's dozen institutions is defining and building Websphere technology and is turning the the Web services and primitives that will grid effort from one of creating virtual organizations with resources to one of provide all Grid services though the user’s creating a distributed service system that is Web browser. Advanced services that will for “modern enterprise and provide for brokering, co-scheduling, interorganizational computing advance reservation of CPU capacity, environments” [OGSA]. The end goal for network bandwidth, and tertiary stored data industry is to create move to providing “on- availability are currently being developed. demand computing” services rather than Collaboration services are also being computing hardware. Sam Palmisano – developed that provide for secure distributed CEO of IBM – is quoted as saying grid collaboration group management, computing is the “is the most important messaging, versioning and authoring, and imitative IBM has undertaken since the the definition and management of “virtual Internet”[BW]. organizations.” These services are being integrated with the basic Grid services, Current Grids are mostly based on services frequently through Web Grid services. from the Globus and Condor software packages, and emerging data Grid and Web Gnutella based portals. Globus services provide a standard way to define and submit jobs, manage the code and data associated with Overview of the protocol and those jobs, and locate and monitor the architecture available resources across geographically and organizationally dispersed sites. They The Gnutella protocol is a peer-to-peer also provide a consistent set of security (P2P) designed for resource services based on X.509, PKI, or Kerberos sharing across the global Internet. The authentication, proxy certificates to carry the network is built completely at the user authentication to remote resources, and application layer, and nodes interact via a set of secure communication primitives client programs running on their local based on the IETF GSS-API: secure telnet, machines, irrespective of the underlying remote shell, and secure ftp are provided by physical network. As originally conceived, using these services. Condor-G provides job connectivity, routing, and resource searching management on top of the Globus services, are handled in a wholly distributed way, ensuring that one or more associated jobs with every node nominally equal to every other (recent upgrades changed this slightly). Any differences in a node’s ability A ®B: GNUTELLA CONNECT/ 0.4 stem solely from their own computational, B® A: GNUTELLA OK memory, or network bandwidth relative to other nodes. A and B then exchange Gnutella protocol messages. Each protocol message contains a The Gnutella protocol was originally a very 23- descriptor of the form {id, type, simple protocol, completely specified in a TTL, hops, payload length}. The first field is sparse 6-page document [GNU]. The a 16-byte descriptor number (roughly) protocol eventually grew in complexity as unique on the network. TTL is the time-to- the popularity and function exceeded its live of the packet on the overlay network, very simple initial design. Currently, and hops is the number of hops thus far the Gnutella developers refer to the original message has traveled. Each time the protocol (with minor modifications) as message transits a node on the network the version 0.4, and the next generation hop count is changed. The payload length protocol, which has absorbed a slew of new describes the actual data in the Gnutella features and even wholesale protocol packet, if any. This number is crucial, as additions, as version 0.6 more than one Gnutella packet can fit in one IP datagram, and there are no breaks in the The sections that follow briefly describe Gnutella datastream. Lastly, the type Gnutella in its original incarnation, followed describes which of five message types is in by a description of the relevant parts of the the packet, either Ping, Pong, Query, later version. They are necessarily brief but QueryHit, and Push. more detail can be found at [Gnu] and [GDF]. Ping messages carry no payload, and are used to explore the network for more neighbors. Upon receiving a ping, a node Gnutella v0.4 Protocol will decrement (increment) the TTL (hops) field appropriately and pass along the ping to all neighbors except the originating node. The Gnutella protocol consists of five types That node will also return a pong message of messages: ping, pong, query, query hit containing as payload a 13-byte descriptor (the reply to a query message), and push. A {port, IP address, number of files shared, ping message is used to discover new nodes kBs shared}. Note, pongs are sent back on the network. A pong message is sent as a along the network overlay back to the reply to a ping and provides information originating node, not directly. Hence, a about a network node, including IP address, pinging node ostensibly learns of the port number, and number of files shared. A existence of all nodes within a radius of one query message is used to search for files TTL. shared by other nodes on the network. It contains a query string and a minimum When a node decides to find a file or requested link speed. A query-hit message resource, it sends a Gnutella header along contains a list of one or more files which with a descriptor {minimum speed, search match a given query, the size of each file, criteria}. The first field is a two-byte and the link speed of the responding node number that lists the minimal connection (in Push is used to upload a file to clients kb/s) of nodes that should respond, followed behind a who cannot files by the specified search criteria. Nodes who themselves. match the criteria respond with a query-hit

message of the form {number of hits, port, A node initiates a connection to another via IP address, speed, result [1…n], host ID}. a two-way handshake: Each result is a tuple {File Index, File Size, File Name}, with File Index a unique ID original protocol soon displayed its issued by the responding node, and File inadequacies. Early measurement studies Name some human-readable tag to display showed that as much as 50% of Gnutella on a hit list. The descriptor ID number on traffic consisted of superfluous pongs the 23-byte Gnutella header must match that flooding the network. Since nodes were of the query packet's header ID number. regularly looking for neighbors, as well as This allows matching by both the query sending 'keep-alive' pings that signaled their launcher and all intervening nodes. Hits are continuing existence to current neighbors, sent back via the overlay network to the cascades of redundant pongs were choking originating node. connections, and impeding searches. In addition, in the early days of custom-written Finally, to access a resource (e.g. a file), the clients, some Gnutella nodes were engaging querying node establishes a TCP connection in anti-social behavior like hammering with the responding serving node, and sends neighbors with pings or queries, injecting an HTTP GET request of the form: packets with large TTL values, or continuing to forward packets they had GET /get///HTTP/1.0 \r\n developers of clients, plus open source Connection: Keep-Alive \r\n participants, added a series of heuristic Range: =0-x \r\n modifications to the original protocol, as User-Agent: Gnutella \r\n well as some fairly fundamental changes that are collectively called 'v0.6' (for The range field allows for continuing a somewhat obscure reasons). The most disrupted download, or parallel significant change concerned network from several nodes. Note, the download is topology is described below. outside of the overlay network, and direct between the serving and the downloading In an attempt to make Gnutella scalable for nodes. mass usage, developers imagined establishing a minimal hierarchy in the Finally, in addition to the above four basic Gnutella network. So-called 'supernodes' or message, the protocol supports a PUSH 'ultrapeers' would be able to leverage the message to allow downloads from firewalle d superior bandwidth of their hardware and hosts. The query originator, in addition to handle a larger of search routing and the Gnutella header, sends {node ID, File connectivity, while keeping low-bandwidth Index, IP address, port} to the firewalled nodes from choking traffic via their slow node via the network. The node ID is a connections. Functionally, each supernode random unique 16-byte string, and the keeps connections open to a set of leaf address and port refer to that of the query nodes, and to a number of other supernodes. origination node. The serving node then Leaf nodes themselves keep connections starts a TCP connection back to the querying only to their supernode, which handled its node, along with a string indicating the file traffic to the rest of the network. Hence, the in question. With just a standard the TCP set of high-bandwidth supernodes form a connection established (which is allowed by data bus used by the larger network. almost every firewall), the querying node then sends the HTTP request and everything By using a header in the handshake proceeds as before. messages passed on connection initiation, nodes then negotiate connectivity based on Gnutella v0.6 Protocol their status: leaf nodes subsume their existence to a supernode, and supernodes With Gnutella's meteoric rise in popularity collect leaves and inform other supernodes following the disbanding of Napster, the of their existence. In more sophisticated implementations of the idea, leaf nodes pass and discussed at length. The author also a file index to their supernode, who then implements an attack on his own webserver answer all incoming search queries on its through the generation of fake query-hit le aves' behalf. This reduces supernode-leaf messages. The work presented in [Das] also traffic, and shields the leaf nodes from all considers the security aspects of the traffic other than direct download requests, Gnutella network, this time problems with and whatever traffic they themselves the Gnutella protocol itself independently of generate. any underlying protocols are considered. However, the main focus is the development of a traffic model to deal specifically with Threat Categories query-flood DoS attacks.

A paradigm shift is needed when The simplicity of version 0.4 of the Gnutella considering the security of peer-2-peer protocol leads to some inefficiencies. networks and the associated threat models. Perhaps, the most notable of which is the In the standard client-server architecture, need to broadcast query messages when services are provided by a particular host (or trying to locate resources. This causes an a small group of hosts). Thus, by attacking a exponential growth in the number of specific machine an attacker can subvert, messages in the network. Version 0.6 of the modify or make a service become Gnutella protocol addresses many of these unavailable. For example, if Ebay's website issues, but it also introduces another level of is successfully attacked, no one will be able complexity when compared to the very basic to have that service (i.e. take part in 4 messages seen in version 0.4. One of the auctions) they will have to use some other most significant changes is the introduction website. In these cases, services are linked of a two-tier system where high-bandwidth to hosts, attacking a service means attacking nodes (supernodes) help decrease traffic by a host. With decentralized P2P networks caching query routing information. that is no longer true. One can still attack However, query communication between specific hosts in the network, but because supernodes in version 0.6 is essentially the the services provided are not (usually) same as in version 0.4, i.e. queries are made provided a small number of hosts, it is not by broadcast on the neighboring network clear what that would achieve. For example, graph. by attacking a single supernode in the Gnutella network one would not be able to Both the connectivity (ping/pong) and the make any single file become unavailable. querying (query/query-hit) functionalities of On the other hand, attacks mounted against the Gnutella protocol lay the burden of the whole network, may try to disrupt a multiplexing/demutiplexing messages on single service while leaving others intervening nodes and hence assume implicit unaffected. Decentralized P2P networks faith in third parties. Nodes are assumed to decouple services from hosts. A similar be well behaved. The design focus is decoupling is needed when analyzing the primarily functionality and efficiency, as security of such systems and different threat supposed to security. There has been no models emerge. Threat models are the focus attempt to establish in reality how here. trustworthy these intermediate third parties really are. There are no protocol Flooding mechanisms to establish or estimate this and we have been unable to identify any open In [Yaz], many flooding attacks that rely on the interactions of the Gnutella protocol with the TCP/IP protocol stack are considered proposals to use the underlying protocol However, there is absolutely no guarantee infrastructure to obtain such information1. that the protocol has to be followed and thus is open to abuse. For example, a malicious Attacks consisting of flooding a single type vendor could sell a Gnutella client that of the 4 basic messages in Gnutella have relays “difficult but unyielding” queries only been considered in the literature. Flooding to its competitors, keeping its own clients with reply messages (i.e. pong and query- from that load. In the abuse hit) is thought to be unfruitful as replies are is simply to cause too much work, another dropped unless a previous matching ping or possibility would be supply erroneous query was sent over the same network replies to all queries. However, if the connection previously. Any malicious nodes network is to provide any functionality the immediate neighbors would, just by query mechanism must be able to request following of the protocol, curtail the service. This is an indication that some efficiency of any such attack. The situation feedback control should be present on any was very different with ping messages mechanism that can cause work to be because these were propagated through the performed. But is not present in the Gnutella network. Precisely due to the large amount protocol. Some proposals to achieve this of traffic ping messages can generate the have been presented (such as hash-cash) and new ping caching techniques introduced in will be discussed later. version 0.6 largely make ping flooding a thing of the past, by seriously limiting if not Content Authentication completely preventing ping messages from propagating beyond immediate neighbors. Whenever a file is downloaded, there needs Query flooding still presents a major threat to be confidence that the contents of the file and is addressed in depth in [Das] and can are what is expected and advertised. In the only be minimized – not prevented – by load best case, not only is the file what is balancing. Attacks consisting of a mix of expected, but also it is only what is messages have not to our knowledge been expected. In other words, there is no considered in depth in the open literature. additional information or capabilities in the Whether or not such attacks can be more file that the user does not know about and effective than the simple ones already probably does not want. Current practice in considered is still unclear. cooperative P2P networks such as Gnutella

relies heavily on good faith trust is trust We suggest that the reason query-flooding between the consumer and the provider that attacks still present a major concern is a the file label corresponds to the unaltered fundamental one. The main functionality file content. In other words, if you request a provided by the Gnutella protocol is particular song title, what is transferred is distributed file searching. In a distributed that exact song and only the song. file search the work is spread so that many Unfortunately, there is nothing but good will hosts look for the same files. Thus, the query assuring that “what you see is what you message requests a service from the network get”. and requires a certain amount of work to be performed. When correctly followed the Currently the only way to tell if the file Gnutella protocol ensures that the load is delivered is the file expected is to listen to it collaboratively spread amongst the nodes. and see if it sounds correct. There is little overhead for a user listening to a file, and 1 Perhaps a framework on the lines of [GN2] discard it if it is not what is expected. would be useful here. Extra book keeping is However, this is not guarantee that the file is arguably the major ingredient that is required to intact or unaltered. Besides substituting the produce an estimate of a node’s reliability, the expected content with other content, but network overhead maybe minimal. even more sinister things can occur. Content to the unbounded searches and anarchic may be added in ways that are not detectable topology of networks like Gnutella and through listening. An alteration might be Napster. Almost all such networks depend harmless, but it may also be able to on a global hash function to map node and introduce virus code, messages, and other file ID's to some logical space, with subterfuge. accompanying connectivity and search procedures to channel queries, provide Hijacking queries bounds on queries, deal with node failures, etc.

Because of its trust in intermediate third One of the more intuitive proposals is that of parties, Gnutella is highly susceptible to Content Addressable Networks (CAN). The malicious behavior, as has been global hash maps to a d-dimensional demonstrated by the numerous attacks hypercube, with each node being responsible described in [Yaz]. However, if one for some chunk of the hash space. With d=2, considers the new paradigm of attacks where our space becomes a sheet and, after several services are targeted as opposed to hosts nodes have joined, the network resembles a there has been little work done. In the rough checkerboard, with each square Gnutella protocol, intermediate parties are controlled by one node that stores all files able to see a significant share of the queries whose key (e.g. title or metadata) hashes to from all servers within their local subgraph. its square. The routing information goes 1/d only as O(d), but lookup cost is O(d N ) What damage could be inflicted upon the where N is the number of nodes in the network if a node misbehaves and uses network. In addition, routing is not very query information? Every supernode has the robust to node failure, as the ungraceful ability to see a large proportion of the departure of neighboring nodes leaves a queries. Gnutella has a 7 hop query protocol querying node clueless to its surroundings, so the query will proceed through as many and unable to complete it's query. To date, as 6 supernode hops. If every supernode CAN implementations have not made it all knows of 4 others, then it is conceivable that the way into a real-world system. almost 1,300 supernodes will see the query.

So, essentially, the Gnutella is the logical The Chord project “aims to build scalable, equivalent of a broadcast Ethernet with more robust distributed systems using peer-to-peer than 1,300 nodes able to listen to and ideas. The basis for much of our work is the respond any query. It may be unclear how Chord distributed hash lookup primitive.” much damage can be inflicted on the [CHORD] Chord arranges its nodes and files network by a node’s ability to listen to much on a modular ring, with each node of the query traffic and to response as it maintaining O(log N) neighbor information wants, but it is certainly clear that on a network of size N. For instance, if m is anonymity is compromised. Furthermore, the number of bits in the node/file even in the best case, this opens the P2P identifiers, then the ring extends from 0 to network to other attacks. m-1 2 . Each node maintains a table of th pointers, where the i entry contains the Threat Solutions identity of the node at least 2i-1 away on the hash ring. Basically, each node possesses a Distributed Hash Tables pointer (containing a real IP address) to nodes roughly increasing in powers of 2 away. Individual nodes are responsible for storing files between themselves and the Recently, a number of DHT-based P2P previous node on the ring. Hence, a query networks have been proposed as alternatives begins by consulting its pointer table and tries to find the successor node for the dependence of failure on the queried identifier on the ring. If no such arrival/departure rate. Specifically, with a pointer exists, the query forwards his query large node fail/join rate of 10% per second, to the node closest in the hash space to the only 7% of queries went unanswered, identifier. That node will likely have more indicating considerable robustness even in information concerning nodes in its area, the face of dramatic simultaneous failure. and will either return a correct pointer to the querying node, or forward in turn the query DHT and Gnutella to a closer node. In either case, the distance between the query and the sought-for file or While DHT networks can provide an node always decreases by at least a power of impressive set of features wholly absent two, giving a bound of ogl N overall for from more ad hoc P2P networks, we would searches. not suggest that a DHT-style network should

replace Gnutella . It would be unfeasible. Such a network, like all P2P networks, is The hash assigns file storage based on a susceptible to the effects of node failure on random function, not based on what files performance. In the case, of Chord, even users already possess. In networks like fairly catastrophic node failure results in a Napster and Gnutella users share what they functioning network, if with a diminished have, and it would be impractical to imagine O(n) lookup. Nodes joining the network users storing files other than their own. This generate O(log N) traffic to construct their mean s the host with the hash of the content pointer table, as well as a similar also has to have the content or be able to get communication complexity to update other it efficiently. Rather, we would suggest nodes' tables. Only one transfer and re- running a DHT-style lookup service shuffling of files occurs between the alongside the going Gnutella v0.6 entering node and the former successor node architecture. Such a scheme would imbue for that chunk of hash space. In the Gnutella with two important properties that background, nodes constantly run a it has so far lacked: file permanence and stabilization algorithm that keep them guaranteed lookup. current on routing information and make sure pointers are fresh. To realize the importance of these two

features, pause and think of security Chord's designers have run simulations protocols in common computer networks. modeling its behavior under a variety of Most systems either depend on a trusted conditions. The number of hops required to computing base (e.g. a typical host-clie nt resolve a query was indeed shown to go as password authentication with an a priori log N, with a mean of 4.5 hops for a network trusted user list), or the ability to read from of 1000 nodes (here a hop refers to the or write to a global information repository number of nodes traversed in searching for (for example, reputation scores in ebay, or an identifier). A simulation of node failure, entries in a PKI), as well as the ability to with nodes failing randomly while queries access such memory when needed. Gnutella, are underway, showed essentially no as it exists now, possesses none of these network lookup failure. In other words, the building blocks; no nodes are fundamentally percent failed lookups was almost exactly trusted, files come and go on the whim of the percent failed nodes, indicating that their storing nodes, and findability is searches failed due to keys disappearing off determined purely as a tenuous function of the system along with their hosting nodes, topology and network activity at the time of and not due to a system failure. query. Such lack of functionality limits any Additionally, looking at query failures as a security measure to stopgap hacks like varying function of the rate of node join (and departure), the testers saw a linear HashCash or an inefficient (and probably Authentication, Encryption ineffective) indirection. and FFTs With a Chord ring at the supernode level in Gnutella, to take an example, one can It is not clear what benefits could be imagine le veraging the collective storage to achieved by the use of authentication or establish a reputation system to rate encryption in the current Gnutella participating nodes. Thus, nodes known to framework. Both authentication and distribute false or misleading files can be encryption primitives assume “trusted” deservedly reported. Search results would endpoints to establish communication then be accompanied with a report on the channels, these are simply not present in supplier's reputation, allowing downloader's Gnutella right now. A querying host has a to avoid disseminators of false files. priori no idea of who might answer the Likewise, reputation would allow regulation query. Without the existence of an of 'free-loading' on the network, whereby underlying “trust or reliability framework”, nodes burden the network with searches and other mechanisms (like sampling a few downloads, and yet provide no files possibilities) may be more fruitful. themselves. Uploaders could preferentially serve users that provide clean reputations, A simple example is to imagine a incentivising even casual users to share what modification to the Gnutella protocol to try they have. to stop censoring of query replies in the Gnutella network. We do this by having Such an add-on would place an acceptable replies be sent straight to the IP address of burden on the supernode layer. Currently, the querying host end we also establish end- some 90% of supernode traffic stems from to-end encryption/authentication. Obviously, queries and query hits, with supernodes there is no mutual authentication, as it is having to field their leaf nodes' traffic as impossible to know who will reply. Thus, well as cross-network traffic for which they any malicious intermediate node can still are the bridges. Chord lookups on a network censor the reply by mounting a man-in-the- with roughly a thousand members (about the middle attack. The use of cryptographic size of the supernode graph on Gnutella) primitives relies on the existence of an take anywhere from 2 to 8 hops. The TTL underlying “trust infrastructure” such as a on most legitimate Gnutella traffic is 7, private key infrastructure. placing a Chord lookup within the envelope of most going traffic. One way to prevent content alteration is to have a trusted third party verify the content Recall also that 7-hop Gnutella queries through mechanisms such as digital expand in an exponential flood, while Chord signatures. This is a proven mechanism and lookups visit only one node per hop, further can be implemented in a straightforward diminishing Chord's effect with respect to manner. However, there are limitations to the already existing bandwidth burden on this approach. First, this is a much more supernodes. Additionally, the increasing centralized approach. Every resource or efficiency and precision of the Gnutella item of content would have to have a digital search architecture following the adoption of signature, meaning the overhead for a peer yet more v0.6 features, such as the Query to provide a resource becomes higher. Every Routing Protocol and GUESS (a UDP-based down load would have to contact the central iterative search add-on), will diminish authority to get the key for the signature. further the rather redundant query traffic at This approach may be useful for some P2P the supernode level, opening up more applications, particularly ones where there is bandwidth for interactive reputation or either a high cost associated with the risk of authentication protocols. an altered resource. Another instance where digital signatures may be acceptable is when framework discussed above to either the P2P network is relative modest or enhance or detract from a node’s reputation. limited in scale, for example when a P2P network is applied to business-to-business A more fundamental question is how to transactions. But, for very large, dynamic compare files in order to have some networks like Gnutella that have tremendous substance to select from. In some cases, a numbers of participants, it may not be straight checksum may be viable. A feasible nor desirable to have any part of checksum requires two issues that possibly such a network rely on a centralized are non-optimal. The first issue is that K resource. (with K> 1) copies of the file have to be transferred to the receiver, using multiple Another more generalized approach is to times the bandwidth and storage. Either the acquire several copies of a file from entire file has to be transferred K times, or a different sources. Gnutella already provides large part of it. If only parts of the file are multiple sources for the requestor to choose used, more problems come into play in at from if more than one source has the file. least the Gnutella case. Consider two music Once multiple copies are acquired, various files from two different sources. While types of checks can be made to determine if containing effectively the same music, it is they are the same. Once there is the ability highly unlikely they are bit-wise identical to compare the contents of multiple files, it since they were created on two different becomes possible to implement a voting or CDs, maybe at two different times, from two selection scheme. Depending on the different manufactures and with two confidence needed to assure altered files is difference manufacturing processes. All this detected; a requestor can use well known will cause checksums to be different. The methods such as straight voting. For CDs may have different errors and timing. example, if two out of three files are judged There may be different lengths of idle time to be equivalent, then the minority file is before the music or after the music. Thus to considered bogus. A more robust selection use partial file checksums, firm conventions method could be implemented by following would have to be put into place for all files an algorithm such as described as the that may require the cooperation of the “Byzantine Generals Problem” [Lam] which originators of the file. The second and more will guarantee the correct outcome. The critical issue is in order for checksums to be basic algorithm works if there is a small the same; the files have to be bit for bit number of treacherous nodes relative to the identical. Some P2P applications do want overall number of nodes. If the number of absolutely identical files bit for bit. treacherous nodes might be large, then the However, a number, including Gnutella query originator would have to have would not be able to use a checksum multiple rounds of communication with all because equivalent files are not bit for bit the participants to determine which are identical. A more flexible solution is to use truthful. There are multiple implementations Fourier Transforms to compare the files. to solve the problem, each with increasing The approach is to take any two files X and cost (in both messages and computation) Y out of the set of M files, and perform a based on the needs of the query originator. Fourier Transform on each. Given the In either case, if the files are not identical, Fourier Transform F, one can calculate then the consumer has options ranging from F(X), and F(Y’), where Y’ is the file Y, rejecting the dissimilar file to ignoring all written in reverse order. Then taking the other responses from the resource that sent inverse Fourier transform of the product of the bad file. Once it is possible to determine F(X) and F(Y’) provides a signal profile truthful and untruthful hosts, it is also whose characteristics are like Figure 1. The possible to send a message to the reputation sharpness and height of the peak of the signal will indicate how likely it is the files agree. The location of the peak along the x- function and the size of the hash space, two axis indicates when the files start agreeing. files whose value is identical may The sooner the peak appears, the earlier the beconsidered equivalent. files agree. For equivalent files, a very steep, high peak near the x origin would be Thus, for a cost of bandwidth and/or expected. The concept is to perform pair- computation, it is possible to verify the wise comparisons on N copies of the file, authenticity of a file's contents and to detect where N will vary based on the needs of the and eliminate untrustworthy peers to an consumer. arbitrary degree. While not foolproof – an attacker can in theory launch so many responses that it would have high probability the consumer would select more bogus files Figure 1 – A schematic diagram of using than correct files, this is unlikely. Fourier Transforms and a selection mechanism to decide if a files has been altered even if the files are not bit wise identical.

Proposal for an

experiment File File File X Y Z F -1 [F(X) * F(Y’)] F - 1 [F(Y) * F(Z’)] In the following section, we propose a measurement experiment on Gnutella network. Since such an experiment, and the a a attack it hopes to emulate, depend intimately Same? Same? Yes/No Yes/No on graph of the Gnutella network, we spend Vote? the next sections explaining Gnutella

Decide if there are any bad file providers? network structure. A description of the experiment itself follows. The FFT scheme has several advantages. First, it can be used in all cases – whether the files are bit wise identical or not. Gnutella topology Second, the entire file does not have to be present. FFTs can be blocked – operating It has recently been observed that a on segments of the file – and then blocks multitude of networks in both the natural combined if it is necessary to test the entire and human worlds follow a power-law file. It is likely for a network like Gnutella distribution, where the probability of a node -a only the first parts of the file are needed. having degree k goes as P(k)~k . Here, a Using part of the file has the dual advantage generally ranges from 1.2-4, and is a global of saving bandwidth and computation. That parameter describing the network. Such savings in bandwidth could be avoided all disparate sets as World Wide Web pages, together or used to acquire more copies of authors in science publications, Hollywood the file to improve the accuracy of the movie actors, and human sexual contacts are checking and to prevent an attacker from described by power-law networks of varying overwhelming the number of good files by a. Gnutella, with its ad hoc and chaotic sending many copies of bad files. These connection architecture, also possesses such choices could be tuned based on the network a structure (see below figure). characteristics of the receiver. A variant of the above scheme is to take the file content and pass it through a hash function to provide a value. Then, based on the hash such general scaling stands some enquiry. As Albert and Barabasi [Bara1] showed, large-scale structure is determined by how entering nodes attach to the growing network. Specifically, if nodes attach to a node i with probability proportional to

ki/SIki, (where ki is the degree of the ith node) a power-law graph results. The phenomenon is summarized by the phrase “the rich get richer”, reflecting how the time rate of change of connectivity is proportional to the instantaneous

connectivity, allowing nodes that get slightly ahead of the game to rapidly outpace their less well-connected colleagues.

In Gnutella, nodes join by contacting well-known nodes that in turn provide a new node with the contact information for yet other nodes. While the technical details of node joining are slightly complex (as well as obscure where the protocol specification is ambiguous), it seems safe to say that well- connected nodes, due to their existence being known to many nodes, are probably

disproportionately represented in node-join Figure 2 Two crawls of the Gnutella handoffs. This fact, combined with a base network, from [Rip]. The first above was of users that occupy a spectrum from taken November 2000, the second below occasional users to hardcore Gnutella March 2001.As can be seen, the topology of stalwarts, leads to the variation in Gnutella is in rapid transition. In between connectivity indicated in previous figures. It the two crawls, LimeWire and , seems safe to say that Gnutella is now, and two leading Gnutella clients, instituted the will remain for the near future, a power-law two-tier hierarchy of nodes, as described in network of possibly varying a parameter. previous sections. The sudden change around link number 10 reflects the existence of two communities of nodes: leaf nodes Graph properties connected to one or few supernodes, and the supernodes connected to a brood of leaf Such networks demonstrate many interesting properties, most notably a small nodes as well as any number of other supernodes. Essentially, the Gnutella network diameter, or short average path network is splitting into two superimposed length between nodes. This leads to the networks, with the supernode graph well-known ‘small-worlds’ effect, whereby retaining power-law properties but at a any two nodes in even large networks are connected by a relatively small number of different a constant. hops. As previously explained, searches on Gnutella are propagated via undirected The reasoning behind why such floods on a network with uncertain topology uncoordinated phenomenon as web page and constantly changing node population. authoring and human sexual behavior show Despite these harsh conditions, searches in Gnutella are remarkably successful, precisely due to this short-range property of the graph. Whether by accident or design, the power-law structure has saved Gnutella from total breakdown, even when the network outgrew its humble beginnings to become a global network moving terabytes of data over tens of thousands of machines. Any proposed changes or improvements to the protocol should preserve the fundamental network properties that have made Gnutella so long-lived.

Another large-scale property of power-law networks is that of robustness under node failure. As modeled by [Bara2], power-law networks exhibit different behavior under the effects of either random or targeted node loss, particularly compared to random graphs. Cohen et. al. [Coh] found that the proportion of nodes p that could fail, without splitting a power-law graph of parameter a into disconnected components was limited by: Figure 3 Two network plots of the Gnutella a - 2 network , from February 16, 2001, spanning p £1+ (1- ma -2K 3-a ) -1 3-a 1771 nodes [Sar]. The first contains all nodes and their accompanying connections. In the second, the top 4% connected nodes have been Here is p is the proportion of failing nodes, removed. The resulting network is fragmented m is the lowest possible degree, and K the into multiple pieces. maximum possible degree. Taking a=2.4 from earlier (and perhaps now dated) measurement studies, and setting A final interesting property of power-law m=1, K~30, we arrive at a critical proportion networks refers to their epidemic properties. Networks are often used as models of pc of about .7. Hence, upwards of 70% of Gnutella nodes can fail and the graph disease-propagation among a community of remains topologically connected. carriers, with nodes representing potential This tremendous robustness is necessary in infected organisms and edges modes of the face of occasional physical network transmission. Quantitative epidemiology has failures, and far more frequent node developed a classic SIR model (for disappearance, where users simply turn off susceptible/infected/removed, the three after downloading. Statistics show [Sar] that states nodes can be in) describing the extent 50% of Gnutella nodes spend no more than and speed of outbreaks given certain disease 60 minutes online at any given time, transmissibility. Recently several indicating a constant churn of online hosts. researchers have extended the model, Against more targeted node failure however, examining the effect of underlying topology a power-law network shows considerable on outbreak strength. Pastor-Satorras and disadvantages. Since a minority of nodes Vespignani [Ves] have claimed to have constitutes a majority of the connectivity, analytically proven that scale -free power- removing those few nodes has dreadful law networks allow diseases to reach consequences on the network. epidemic proportions even with low transmission rates between nodes. Newman [New], using both analysis and simulations, machine and had no way of taking over the shows that power-law graphs of certain a Gnutella client itself. However, such a exhibit epidemic behavior no matter how shortcoming will only last until the first low the transmissibility between nodes. The security hole in a well-distributed Gnutella arguments of both authors are lengthy and client is discovered. complex, but an intuitive insight can be had by analogy to searches on the graph. Power- Assuming for the moment clients are safe, law searches work so well as queries very one can still imagine virus-like files that quickly reach a well-connected node, which would be of interest to the Gnutella hacker. in turn propagates the query far and wide. For example, major copyright holders like Likewise, even one infection on a power- the Motion Picture Association of America law graph likely reaches a well-connected (MPAA) or the Recording Industry node, which spreads it to everyone else. Association of America (RIAA) could collude with major makers of mp3 players, like Microsoft’ s Media Player or Experimental design RealMedia’s RealPlayer, such that players would only play specially marked mp3’s a limited number of times (such proposals

have already been made in the context of It was our intention to design a measurement digital-rights management). Copyright experiment that simulated a denial-of- holders (and their hired IT guns) would then service (DOS) attack against the Gnutella exploit Gnutella to spread such specially network. As seen above, the topology of the tagged files, making sure any downloaded Gnutella network imbues the graph with file was of only very limited utility. several interesting properties, some of which might be weapons in the hands of attackers. While this seems counterproductive to the cause of the MPAA/RIAA, in that the For example, the susceptibility of power-law aggrieved party is itself distributing content, graphs to targeted node attack would open a more circumspect analysis reveals its the door for a determined adversary to power. Viruses are successful if they infect shutdown the backbone of high-degree users many hosts. Infectious agents that are too that make Gnutella work. Such an adversary virulent inevitably cause outbursts that burn could employ network-level attacks against themselves out (e.g. the Ebola virus, that such hosts and cut off their network access, incapacitates victims within days and kills or simply emulate such nodes himself, them within a week). Viruses like HIV, for pulling out at the last second after achieving example, are much more effective, as they a massive connectivity. are just as inexorably terminal (essentially

all people infected with HIV develop AIDS It would be difficult to perform such attacks and die), but they allow their hosts a good 5- in an academic context, as they may be 10 years (or more, with the use of modern called morally (if not legally) into question. anti-retrovirals) to spread the virus to others. A better strategy may well be to use the Likewise, the current strategy of the epidemic qualities of power-law graphs to (modest) efforts of anti-piracy agents to distribute unwanted content, and that way spread garbage files when queried for undermine Gnutella’s fantastic file -sharing copyrighted content seems ineffective by potential. Such unwanted content could take comparison. Any given Gnutella search the form of random or garbage files, or more returns dozens of hits, and if a download pointed content like a virus. Virus spreading fails due to a false file, the user just picks on Gnutella has already been documented another uploader. Such tactics will only [Vir], although the virus in question only dissuade the most casual and finicky P2P caused damage when run from the local user. Hence, the file’s ebb and flow in the Much more effective would be to drown the network should be quite independent of entire namespace of interest (e.g. all queries whatever is written in its tag, but the tag containing ‘The Beatles’) with marked and should survive along with the file’s contents incapacitated files. The user who downloads as the file circulates. such a file will not know of its infected state at first, as the file plays normally, until it Our experiment thus involves two stages: in reaches the play limit imposed by its the first, we inject files into the network creators and enforced by the media player. with music files marked with a unique In the meanwhile, other users have random nonce as their mp3 tag. In the downloaded the file themselves, who in turn second, we probe the network, and see to propagate it, and quickly every file on the what extent our marked file permeated. Our system is incapacitated in this way. It seems second probing step will be repeated at much more irritating if what is at first a regular intervals after, and perhaps even normal and impressive mp3 collection during, the inje ction phase. suddenly refuses to play overnight, requiring the slow process of rebuilding it, then Our hope is to use this ‘release-and-catch’ simply having to try one or two downloads methodology to quantify the speed and at first to find a copy of a certain file. extent of content distribution on Gnutella, The topology of Gnutella will certainly aid something until now unexplored. In in such an attack, as discussed above. A addition, this serves as an excellent model of potential attacker need only spread the file the DOS attack detailed above. Whether to a relatively small collection of high- Gnutella’s future attackers intend to simply connectivity nodes, who will in turn distribute false content, or employ the more disseminate it to the rest of the network. If subtle strategy described here, such an wisely done, such an attacker would not experiment will reveal the magnitude of even need vast resources to complete his success one could achieve with a slow- task. burning virus that gives its hosts plenty of time to spread.

The Experiment Operationally, this experiment will involve custom coding a Gnutella client to act as tagger and distributor. Such a client will Fortunately, the technology of music have to achieve high connectivity with the compression allows us to model such an supernode level of Gnutella hosts, getting a attack without need to disrupt network rapid picture of the network’s most well service. The mp3 standard allows an connected nodes. Prior to client arbitrary amount of data to be appended initialization, we will choose a piece of the before the first compressed block of a music file namespace as our virus vector. Intuition file. Such data is typically parsed by mp3 indicates that the best piece of namespace players to provide file metadata like artist would be for files in high demand, with name, recording, etc. to the user looking at large numbers of hosts repeatedly making the player’s GUI. Such a data field can be searches. This way, we will be able to modified at will, and clients uploading and quickly hijack the namespace, pre-empting downloading the file (generally) maintain non-marked files’ entry to the network with the integrity of prepended tag. To our our own marked version. As a guide, one knowledge, however, no client uses the tag can use the ‘Top Sellers’ list on information to perform matches on Amazon.com to determine a well-frequented incoming queries, and the Gnutella protocol part of the namespace on a daily basis. provides no direct way of advertising the contents of the tag to outside queriers. Hence, once situated on the network, and with a target namespace in mind, our client For the past month, development has been selectively responds to queries emanating underway for just such a custom client. Our from selected ‘elite’ high-connectivity initial code base is the Jtella API, a set of supernodes. The cut-off for inclusion in the libraries for network applications on elite category can be made dynamically, or Gnutella. Jtella numbers some 5000 lines, simply as a result of an ongoing crawl (e.g. but we have been obliged to change or add within the current top 5% of connected another 2000 lines for even basic supernodes). The goal is to limit our upload functionality. Jtella was written almost two of files only to a subset of the supernodes, years ago, before the introduction of v0.6, and hence that way parameterizes our later and contains the basic, low-level code for results on file dissemination. That is, on Gnutella connectivity in the form of each inject-and-probe cycle we chose to ping/pong messages, persistent TCP distribute to a greater or lesser fraction of connections, etc. The full-blown supernodes, and measure the effect of experimental client will have to contain greater or lesser file injection on the file’s support for v0.6 features (already done), as circulation. well as the high-level algorithm for maintaining placement in the Gnutella In the probe phase, one can adopt two graph. This latter feature may prove to be philosophies. One is that of an omnipotent the most difficult. Initial attempts at and omniscient network god, trying to connectivity reveal the Gnutella network to discover the presence of every relevant file be a highly dynamic and changing and whether it has been tagged or not. environment. Accurate global knowledge of Essentially, probing everyone at the the instantaneous topology may prove supernode level, downloading upon elusive, if not impossible, and our file receiving a hit, and tallying the percentage injection may occur in an opaque fog of permeation of tagged files versus non- network information. tagged. In addition, certain properties of While such a tack might be ideal, given the commercially written clients tend to produce tremendous bandwidth use that Gnutella fragmentation in an otherwise uniform produces, it may well be unfeasible, even for Gnutella network. For example, it is known fast, high-bandwidth academic machines that BearShare allows its supernodes to (see below problems section). accept only other BearShare leaf nodes (recall that the client vendor is passed along A smarter measurement may be to simply with everything else in the v0.6 handshake), emulate a normal Gnutella user and gauge producing a so-called ‘BearShare cloud’ in what effect the tagging ‘hack’ has on what the network. Our initial connectivity such a user sees. Thus, following injection, experiments indeed showed BearShare we would re-connect as lowly leaf nodes nodes only responding to handshakes with and search repeatedly for files in the ‘BearShare’ in the vendor field. Depending namespace. Downloads on those search hits on whether this behavior continues would reveal their tag, and would allow us (BearShare claims it will change it) our to quantify success as degree of permeation client may well have to spoof a series of on normal user downloads, rather than on vendor names to achieve connectivity over the network as a whole. the entire graph.

Progress and Problems Administrative Issues very few nodes have most of the control and indeed most of the content. While leaving In order to run the experiment as designed, it the network unstructured, v0.6 has the affect was necessary to investigate the regulations of making the query structure more relating to the use of copyrighted material hierarchical. Thus attacks focused on the and other issues regarding Gnutella. The relatively few supernodes make the Gnutella University of California has at least two network much more susceptible to governing documents that apply to this. compromise. Copyright policy is related to the 1998 Digital Millennium Copyright Act. The Gnutella is basically not securable in its university document is “Guidelines for current implementation. Several solutions Compliance with the Online Service were investigated to improve security on the Provider Provisions of the Digital network, such as point-to-point encryption, Millennium Copyright Act” [DMCA]. This authentication and Distributed Hash Tables. policy defines what is possible to do with Some of these are shown to be untenable in copyrighted material in the university the current Gnutella implementations. It context, include what UC considered fare needs additional structure and capability in use. The second policy is the UC Berkeley order to withstand even a modest level of IT policy, which controls what activities are attack. This is important since it is clear the appropriate when using campus computers music industry is planning concerted efforts and networks. The IT policy refers to the to disrupt Gnutella services. DCMA guidelines for how copyrighted material should be handled. This paper introduces the idea that security can be view not just from a host or network The project team approached UC Office of perspective but also from a “service” the President’s General Counsel and perspective. In the Gnutella case, this means described the experiment and its issues. We looking at attacks on the namespace and file also explained why we thought it was content rather on host per se. appropriate and useful to carry out the experiment. Further issues were discussed We have introduced several solutions that regarding the privacy of network users and may be used to increase security and the fact there would be no obvious impact privacy. The use of Fourier Transforms to on the other users of the network. check content in file that are not bit-wise identical enables would enable the query After approximately 6 weeks of discussions, originator to determine if the content in a the University gave approval for the file is what is expected. Coupled with a experiment. We do plan to pursue the selection mechanism, it becomes feasible to experiment at a later date. resist flooding and query interception attacks.

Findings Finally, an experiment was designed and coded to investigate the actual performance Ad hoc peer-to-peer networks, represented of a network and to explore the feasibility of in this study by Gnutella, have significant some of the exploits discussed. This security issues. Previous work focused on experiment could not be performed until v0.4 of the network document many ways legal and administrative issues were the network can be compromised. The resolved within the university pertaining to changed of v0.6, which are intended to the investigation of P2P networks that allow growth to a very large scale introduce exchanged copyrighted material. We more vulnerabilities, because v0.6 turned the received approval for the experiment, but network into a “power law” system where only two days before the submission deadline. However, the effort to gain the approval will not be wasted since it will References allow the experiment to go forward and be [Bara1] Emergence of Scaling in discussed at a later time, and possibly allow Random Networks, Albert Barabasi and other groups to investigate P2P networks Reka Albert, Nature, 286, 509. with their own experiments. [Bara2] Error and attack tolerance of Summary complex networks, R. Albert, H. Jeong, and A.L. Barabasi. Nature, 406, 378.

Despite the apparent need to make Gnutella [BW] Ante, Spencer, “Computing Power more secure for users, the feeling among its sold like electricity”, Businessweek, open-source developers seems to be one of indifference. What 'state-sponsored' hacking November 11, 2002. there has been on Gnutella is relatively small, with the network retaining most of its [CHORD] functionality. Also, the pursuit of Gnutella http://www.pdos.lcs.mit.edu/chord/ users by the MPAA or RIAA has been [Coh] R> Cohen, K. Erez, D. ben- limited to relatively high-profile cases of Avraham, S. Havlin, Resilience of the people either sharing enormous numbers of Internet to Random Breakdowns. files or sharing clearly pirated copies of Phys.Rev.Lett. 85, 4626. copyrighted (and perhaps then unreleased) content. The average Gnutella user sharing [Das] N. Daswani, H. Garcia-Molina and downloading a handful of files has for “Query-Flood DoS Attacks in Gnutella”, the moment little to fear from the large lobby groups and their blackmailed ISPs. If 9th ACM Conference on Computer and the day arrives that file -sharing becomes Communications Security, Nov 18-22 impossible due to a of false content, 2002, Washington DC, USA. or hijacked queries, or when users receive immediate cease-and-desist emails the [DCMA] Guidelines for Compliance moment their Gnutella clients blink on, then with the Online Service Provider perhaps Gnutella developers, some of whom Provisions of the Digital Millennium have commercial interests in the survival of Copyright Act, November 17, 2000 – the network, will start to pay security serious UCOP General Counsel’s Office attention. For the moment, most developers seem content with improving Gnutella to the [GDF] point of competing with currently more successful networks like Kazaa, which http://groups.yahoo.com/group/the_gdf/ enjoys an order of magnitude greater usage [GNU] several websites have statistics than Gnutella. introductory information, e.g. www.bearshare.com, Finally, it needs to be noted that many of the www..com. issues discussed in this document are specific to Gnutella, Kazaa and other very [Grid] http://www.gridforum.org, or, loosely formed networks. Other P2P The Grid: Blueprint for a New networks, including the Grid are designed Computing Infrastructure by Ian Foster with much more emphasis on security and (Editor), Carl Kesselman (Editor) will in all probability have few, and certainly different, issues. [Gro] C. Grothoff “GNUnet – An Excess Based Economy” available at: loper. Further documentation of new and http://www.gnu.org/software/GNUnet/ proposed features is at [GDF].

[Lam] The Byzantine General Problem, [Vir]http://www.commandsoftware.com/ Leslie Lamport, Robert Shostak and virus/gnutella.html Marshall Pease, ACM Tansactions on Programming Languages and Systems, [Yaz] D. Zeinalipour-Yazti “Exploiting Vol. 4, No. 3, July 1982, Pages 382-401. the Security Weaknesses of the Gnutella Protocol” Course project for “Seminar in [NERSC] Computer Security” at the University of http://www.nersc.gov/aboutnersc/pubs/S California –Riverside, Computer trategic_Proposal_final.pdf Science, March, 2002.

[New] The spread of epidemic disease on networks, M.E.J. Newman. Sante Fe Institute working paper. http://www.santafe.edu/sfi/publications/ Working-Papers/02-04-020.pdf

[P2PWG] http://www.peer-to- peerwg.org/whatis/index.html [OGSA] http://www.gridforum.org/ogsi- wg/drafts/ogsa_draft2.9_2002-06-22.pdf [Sari] A Measurement Study of Peer-to- Peer Systems, Stefan Saroiu, P. Krishna Gummadi and Steven D. Gribble. Appeared in MMCN 2002.

[Rip] Mapping the Gnutella Network: Macroscopic Properties of Large Peer- to-peer Systems. Matei Ripeanu, Ian Foster, and Adriana Iamnitchi. IEEE Internet Computing, vol. 6, no. 1, Jan- Feb 2002.

[Sar] A Measurement Study of Peer-to-Peer File Sharing Systems, Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble. In Proceedings of the Multimedia Computing and Networking (MMCN), San Jose, January, 2002.

[v4] Documentation on v0.4 and the major parts of v0.6 can be found at http://www.limewire.com/index.jsp/deve