On the Long-Term Evolution of the Gnutella Network
Total Page:16
File Type:pdf, Size:1020Kb
On the Long-term Evolution of the Gnutella Network Amir H. Rasti, Daniel Stutzbach, Reza Rejaie University of Oregon Abstract the impact of a newly introduced feature de- pends on two factors: (i) the coherency (or com- The popularity of Peer-to-Peer (P2P) file sharing patibility) of features across different brands of applications has significantly increased during client software and (ii) the penetration rate and the past few years. To accommodate the increas- availability of a feature throughout the system. ing user population, developers gradually evolve As new features become available, users upgrade their client software by introducing new features, their client software and the new features grad- such as a two-tier overlay topology. The evolu- ually spread throughout the system. Therefore, tion of these P2P systems primarily depends on the penetration rate of new features primarily (i) coherency across different implementations depends on which brands implement the feature and (ii) the availability of these feature among and the ability and willingness of users to up- clients throughout the system. grade. This paper sheds some light on the evolution In this paper, we examine the evolution of of client and overlay characteristics in the Gnu- the Gnutella network during the past year over tella network during a one year period over which which its overall population has tripled to more it tripled in size, exceeding two million concur- than two million simultaneous peers. Toward rent peers. Our results show several interesting this end, we explore the evolution of the Gnu- trends including (i) most users quickly upgrade tella network across two related angles: their software which provides great flexibility for • developers to react to changes and (ii) as the sys- Client Characteristics that include changes tem has grown in size the top-level overlay has in the overall population, the distribution remained properly connected but there is grow- across different geographic regions, and the ing imbalance between the two levels. popularity of brands and versions. • Overlay Characteristics that represent 1 Introduction changes in various properties of the two-tier Contrary to the common assumption about the overlay topology including balance between limited scalability of unstructured Peer-to-Peer the populations at each tier, different (P2P) file-sharing applications, major P2P file angles of node degree, intra-regional bias sharing applications (i.e., FastTrack (or Kazaa), in peer connectivity, and resiliency to node Gnutella, and eDonkey) have become signifi- removal. cantly more popular during the past few years Modern Gnutella has incorporated a two-tier and now contribute a significant portion of total architecture where a small subset of peers, called Internet traffic [1,5]. For example, over the past ultrapeers, form a top-level overlay while other 12 months, the Gnutella network has grown in peers, called leaf peers, are connected to the top- size by a factor of three to more than 2 million level overlay through one or multiple ultrapeers simultaneous users. To scale with this growth (Figure 1(a)). When a leaf peer cannot locate in user population, different group of applica- a sufficient number of parents in the top-level tion developers have introduced a two-tier over- overlay, it will promote itself and become an ul- lay topology coupled with more efficient search trapeer if it believes that it is not firewalled. mechanisms (e.g., Dynamic Querying in Gnu- The goal of this mechanism is to maintain a tella) in their client software. Given the P2P proper balance between ultrapeers and leaves (and often open-source) nature of these systems, (15%–17%). 1 Our main findings are summarized in the fol- system. Sections 3 and 4 present the evolution lowing list. Due to the limited space, we only of client characteristics and overlay characteris- present a subset of our results. Further details tics in the Gnutella network, respectively. Fi- can be found in the related technical report [6]. nally, we conclude the paper and sketch our fu- • Gnutella is North America-centric, with a ture plans in Section 5. moderate number of European users. 2 Data Collection • There is a strong bias towards intra- To accurately characterize P2P overlay topolo- continent connectivity, especially in conti- gies, we need to capture complete and accurate nents with smaller population. snapshots. By “snapshot”, we mean a graph that • Most users upgrade their client software captures all participating peers (as nodes) and within 2 months, which is much faster than the connections between them (as edges) at a users upgrade operating systems and other single instance in time. The most common ap- types of software. proach to capture a snapshot is to crawl the over- • Peers in the top-level overlay can effectively lay. In practice, capturing accurate snapshots is locate the maximum number of neighbors challenging due to the large size and the dynamic and form a well-connected top-level overlay. nature of P2P systems. Because overlays change • The ultrapeers-to-leaf ratio has unnecessar- as the crawler operates, captured snapshots are ily increased, degrading performance of the inherently distorted where the degree of distor- system. This also implies that leaves are tion is proportional to the crawling duration [11]. unable to locate available open slots among We have developed a set of measurement ultrapeers, and thus unnecessarily promote techniques into a Gnutella crawler, called themselves to become ultrapeers. Cruiser [10]. Cruiser improves the accuracy of Several previous studies have examined char- captured snapshots by significantly increasing acteristics of overlay topologies [3, 4, 7, 12] and the crawling speed primarily through two mecha- participating peers [8, 9] in large scale P2P sys- nisms. First, it leverages the two-tier structure of tems. However, there are at least three impor- the overlay by contacting only ultrapeers. Since tant differences between our work and these prior leaf peers connect only to ultrapeers, all of their studies as follows: First, all the previous stud- topological information can be captured with- ies (except our earlier work [12]) were conducted out contacting them directly. Second, Cruiser more than three years ago on much smaller user significantly increases the degree of concurrency populations and before the introduction of the in crawling by running on several machines and two-tier overlay topology. Second, and more im- opening hundreds of simultaneous connections portantly, most of these studies focused on char- from each machine. acteristics of P2P systems over a relatively short Cruiser can capture the Gnutella network period (e.g., a few days or weeks) and did not with 2.2 million peers in around 8 minutes, or capture any major growth in population. Fi- around 275 Kpeer/minute (by directly contact- nally, previous studies have used either partial ing 22 Kpeer/minute). This is orders of magni- or distorted snapshots of the system that could tude faster than the fastest previously reported significantly affect the accuracy of their results crawlers (i.e., 2.5Kpeers/minute in [8])1. Cruiser [10]. To our knowledge, our work is the only captures the following information from each study that examines the evolution of a P2P file- peer it successfully contacts: (i) peer type (ul- sharing system over a one year period. This pa- trapeer or leaf), (ii) brand and version, (iii) a per builds on our recent work of characterizing list of the peer’s neighbors, and (iv) a list of an graph-related properties of the Gnutella overlay ultrapeer’s leaf nodes. Since the crawler does and its short-term dynamics [12], by exploring not directly contact leaf peers, we do not have long-term trends and effects. information about their brand and versions. The rest of this paper is organized as follows: 1Capturing snapshot of a large scale P2P system with In Section 2, we briefly present our data collec- a slow crawler might be infeasible since the crawler may tion methodology and tools, and explain the im- never terminate due to ongoing arrival of new peers in the portance of capturing accurate snapshots of P2P system. 2 3 100 Total 3 . 3 3 3 3 3 2 5 UltraPeers + 33 80 3 3 33 Leaves 2 3 Legacy Peer 2 22 2 60 NA 3 AS 2 Ultra Peer . 33 × Leaf Peer 1 5 3 22 EU + SA 2 40 1 3 3 32 22 0.5 20 + + ++ + + + + ++ + ++ + + + + + ×2 ×2 ×2 ×2 ××22 ×2 ××22 Population (million) 0 0 Percentage of UltraPeers Core Plane of the Gnutella Topology O N D J F M A M J J A S O O N D J F M A M J J A S O Date (Month) Date (Month) (a) Gnutella’s Two-Tier (b) Growth of Gnutella population (c) Breakdown of ultrapeers by Topology between Oct. 2004, Oct. 2005 continent Figure 1 We have captured more than 80,000 snapshots region in the next section. of the Gnutella network with Cruiser between The distribution of user populations across dif- Oct. 2004 and Oct. 2005. To examine the long- ferent countries has also grown proportionally, term trends in the Gnutella network, we selected except for China where client population has nine snapshots roughly evenly spaced during this dropped significantly (94%). 2 period . To minimize any possible error due to Clients in US, Canada, and UK make up 65%, the time-of-day or day-of-week effects, our candi- 14%, and 5% of the total population, respec- date snapshots were all taken around 3pm PDT tively3. The remaining countries made up less on weekdays. than 2% each, but make up 16% in total. Thus, 3 Client Characteristics while the Gnutella network is dominated by pre- dominately English-speaking countries, around In this section, we explore changes in several one-fifth is composed of users from other coun- characteristics of Gnutella peers during the past tries.