The Gnutella Protocol

The Gnutella Protocol Gayatri March 16, 2007 1 Abstract Peer to peer technology has certainly improved mechanisms of downloading files at a very high rate. Statistics and research show that as more number of peer to peer protocols developed, the number of nodes in the entire network increased. In this paper, we observe the structure and operation of Gnutella, a peer to peer protocol. The paper also focuses on some issues of this protocol and what improvements were further made to improve its scalability. We also have a look at some of the security issues of this protocol. Some statistics in the paper also reflect important changes that the protocol inflicted onto the entire net. 1 2 Introduction The Gnutella protocol is an open, decentralized group membership and search protocol, mainly used for file searching and sharing. The term Gnutella also designates the virtual network of Internet acces- sible hosts running Gnutella-speaking applications. Gnutella nodes, called servents by developers, perform tasks normally associated with both servers and clients. They provide client-side interfaces through which users can issue queries and view search results, accept queries from other servents, check for matches against their local data set, and respond with corresponding results. These nodes are also responsible for managing the background traffic that spreads the information used to maintain network integrity. Queries are not sent to a central site, but are instead distributed among the peers. Gnutella, the first of such systems, uses an unstructured overlay network in that the topology of the overlay network and placement of files within it is largely unconstrained. It floods each query across this overlay with a limited scope. Upon receiving a query, each peer sends a list of all content matching the query to the originating node. Because the load on each node grows linearly with the total number of queries, which in turn grows with system size, this approach is clearly not scalable. 2 3 Design goals of Gnutella Like most P2P file sharing applications, Gnutella was designed to meet the following goals: o Ability to operate in a dynamic environment. P2P applications operate in dynamic environments, where hosts may join or leave the network frequently. They must achieve flexibility in order to keep operating transparently despite a constantly changing set of resources. o Performance and Scalability. The P2P paradigm shows its full potential only on large-scale deploy- ments where the limits of the traditional client/server paradigm become ob- vious. Moreover, scalability is important as P2P applications exhibit what economists call the ”network effect” the value of a network to an individual user scales with the total number of participants. Ideally, when increas- ing the number of nodes, aggregate storage space and file availability should grow linearly, response time should remain constant, while search throughput should remain high or grow. o Reliability. External attacks should not cause significant data or performance loss. o Anonymity. Anonymity is valued as a means of protecting the privacy of people seeking or providing unpopular information. 3 4 Gnutella Protocol Description 4.1 Gnutella at a glance Gnutella’s distributed search protocol allows a set of peers, servents or clients, to perform filename searches over other clients without the need of an inter- mediate Index Server. The Gnutella network topology is a pure Ad-Hoc topology where Clients may join or leave the network at any time without affecting the rest of the topology in any sense. The Protocol hence, is designed in a highly fault- tolerant fashion with a quite big overhead of synchronization messages trav- elling through its network. All searches are performed over the Gnutella network while all file downloads are done offline. In this way every servent that needs to serve a file launches a mini HTTP 1.1 web server and commu- nicates with the interested servents with HTTP commands. The core of the protocol consists of a set of descriptors which are used for communication between servents and also sets rules for the inter-servents descriptors exchange. These descriptors are Ping, Pong, Query, Query Hit and Push. Ping messages are sent by a servent that needs to discover hosts that are currently active on the network. Pong messages on the other hand are responses to Ping requests every time a host wants to come in communication with the peer that requests to join the network (Figure 1, step 3). Although the protocol doesn’t set any bound on the amount of incoming or outcoming connections from a particular host, most Gnutella clients come with a default value for these parameters, usually 10, but which are adjustable by the user. A client joining the Gnutella network for the first time may of course not have any clue regarding the current topology or it’s neighbouring peers making the connection to the network impossible. For this reason most client vendors have setup Host-Caches Servers which serve clients with IP addresses of servents currently connected to the network (step 1). Another mechanism which is widely deployed, but is not part of the protocol, is the use of local caches of IP addresses from previous connections. In this way a servent doesn’t need to connect to an IP acquisition server (i.e. a host-cache) but can rather try to establish a connection with one of its past peers. Right after a peer has obtained a valid IP address and socket port of another servent it may perform several queries by sending Query descriptors and receive asynchronously results in Query Hit descriptors (step 4). Step 6 shows how a servent p1 may download a file from another servent p2. This procedure is done offline with the HTTP protocol. If the servent p2 is fire walled, then p1 may request from p2, with a Push descriptor (step 5), to ”push” the file. 4 4.2 Types of packets used for communication DESCRIPTOR HEADERS Once a servent successfully connects to the Gnutella network, it commu- nicates with other servents by sending a Gnutella protocol descriptor. PING Ping descriptors have no associated payload and are of zero length. A Ping is simply represented by a Descriptor Header whose PayloadDescriptorfieldis0x00. A servent uses Ping descriptors to actively probe the network for other servents. A servent receiving a Ping descriptor may elect to respond with a Pong descriptor, which contains the address of an active Gnutella servent (possibly the one sending the Pong descriptor) and the amount of data it’s sharing on the network. This specification makes no recommendations as to the frequency at which a servent should send Ping descriptors, although servent implementers should make every attempt to minimize Ping traffic on the network. PONG Pong descriptors are only sent in response to an incoming Ping descriptor. It is valid for more than one Pong descriptor to be sent in response to a single Ping descriptor. This enables host caches to send cached servent address information in response to a Ping request. QUERY The primary mechanism for searching the distributed network. A servent receiving a Query descriptor will respond with a Query Hit if a match is found against its local data set. QUERY HIT The response to a Query. This descriptor provides the recipient with enough information to acquire the data matching the corresponding Query PUSH Used when the servent containing the file is behind a firewall. A servent may send a Push descriptor if it receives a Query Hit descriptor from a servent that doesn’t support incoming connections. This might occur when the servent sending the Query Hit descriptor is behind a firewall. When a servent receives a Push descriptor, it may act upon the push request if and only if the Servent Identifier field contains the value of its servent identifier. The DescriptorId field in the Descriptor Header of the Push descriptor should not contain the same value as that of the associated Query Hit descriptor, but should contain a new value generated by the servent’s DescriptorId generation algorithm. 5 4.3 Descriptor Routing The peer-to-peer nature of the Gnutella network requires servents to route network traffic (queries, query replies, push requests, etc) appropriately. A well-behaved Gnutella servent will route protocol descriptors according to the following rules: 1. Ping Pong routing Pong descriptors may only be sent along the same path that carried the incoming Ping descriptor. This ensures that only those servents that routed the Ping descriptor will see the Pong descriptor in response. A servent that receives a Pong descriptor with Descriptor ID = n, but has not seen a Ping descriptor with Descriptor ID = n should remove the Pong descriptor from the network. 2. Query - Query hit- Push Routing Query Hit descriptors may only be sent along the same path that carried the incoming Query descriptor. This ensures that only those servents that routed the Query descriptor will see the Query Hit descriptor in response. A servent that receives a Query Hit descriptor with Descriptor ID = n, but has not seen a Query descriptor with Descriptor ID = n should remove the Query Hit descriptor from the network. Push descriptors may only be sent along the same path that carried the incoming Query Hit descriptor. This ensures that only those servents that routed the Query Hit descriptor will see the Push descriptor. A servent that receives a Push descriptor with ServentIdentifier = n, but has not seen a Query Hit descriptor with Servent Identifier = n should remove the Push descriptor from the network. Push descriptors are routed by ServentIdentifier, not by DescriptorId. 6 5 Mechanisms 5.1 Functioning mechanisms How does a servent join the Gnutella network? A Gnutella servent connects itself to the network by establishing a connection with another servent currently on the network.

The Gnutella Protocol

Diapositiva 1

IFIP AICT 306, Pp

Instructions for Using Your PC ǍʻĒˊ Ƽ͔ūś

Mobile Peer Model a Mobile Peer-To-Peer Communication and Coordination Framework - with Focus on Scalability and Security

Copyright, an Incentive Or a Burden?

Future Internet (FI) and Innovative Internet Technologies and Mobile Communication (IITM)

Gnutella Protocol

Evidence Collection for Forensic Investigation in Peer to Peer Systems Sai Giri Teja Myneedu Iowa State University

Unstructured / Structured Unstructured / Structured

Evidence Collection in Peer-To-Peer Network Investigations Teja Myneedu, Yong Guan

Gnutella and Freenet

Resource Discovery Mechanisms in Grid Systems: a Survey