An Empirical Study of Seeders in Bittorrent
Total Page:16
File Type:pdf, Size:1020Kb
An Empirical Study of Seeders in BitTorrent Justin Bieber, Michael Kenney, Nick Torre, and Landon P. Cox Duke University, Durham, NC Abstract a wide range of content to nearly 4 million users at any moment. BitTorrent has attracted attention from researchers As was the case in Izal's study, we collected data by and the press for its wide deployment and explicit goal passively monitoring BitTorrent use through the trackers of eliminating free-riding, but many of the most impor- that coordinate peers. This prevented us from directly tant peers in BitTorrent operate outside of its incentive observing how users respond to BitTorrent's incentive mechanisms. Altruistic seeders help bootstrap new peers mechanisms, but provided us with the number of seed- and provide a significant fraction of the global upload ers (peers who are uploading, but not downloading) and bandwidth. We have taken an empirical approach to un- leechers (peers who are downloading). derstanding seeders by studying 35 BitTorrent sites with Seeders are particularly important for bootstrapping nearly four million users at any moment over several new peers and to network performance. Many sites ex- weeks. plicitly encourage peers to continue uploading data even Our study focuses on two aspects of seeders. First, we after their download has finished. These reminders are looked at the relationship between the number of seeders necessary because seeding is altruistic rather than self- and bandwidth utilization. A case study of a Linux dis- interested; BitTorrent's incentive mechanisms cannot re- tribution network showed that as bandwidth utilization ward seeders for the upload bandwidth they contribute. increased, the rate of seeding decreased. Second, we This makes seeders difficult to model game theoretically looked at the relationship between site attributes and the and motivated our empirical study. number of seeders. A survey of 34 BitTorrent sites over We have focused on two aspects of seeders. First, we two weeks found that the presence of niche-content (e.g. examined the relationship between bandwidth utilization only anime, hip-hop, or Linux files), merchandise for and the number of seeders. A case study of a BitTor- sale (e.g. t-shirts with the site URL), and negative rein- rent site over 23 days showed that the ratio of seeders- forcement (e.g. a posted list of the 10 least contributing to-leechers decreased dramatically when bandwidth be- peers) correlated positively with the rate of seeding. came highly utilized. Second, we looked at the relation- ship between site attributes and altruism. We were in- 1 Introduction terested to know which, if any, site characteristics cor- relate with a high seeders-to-leechers ratio. In a study Free-riding in file-sharing networks such as Gnutella, of 34 different sites, we found that networks with niche- Napster, and Kazaa is a well documented phe- content (e.g. only anime, hip-hop, or Linux files), mer- nomenon [1, 6]. Because of this, the BitTorrent [3] con- chandise for sale (e.g. t-shirts with the site URL), and tent distribution system made robustness to free-riding negative reinforcement (e.g. a posted list of the 10 least an explicit design goal [4]. BitTorrent's attention to in- contributing peers) exhibited greater altruism than those centives as well as its responsibility for 35% of Internet without these features. traffic [11] has generated interest from both the research community [5] and popular press [12]. However, despite 2 Background and Motivation this attention, there have been no broad empirical studies of BitTorrent. BitTorrent [3] is a popular peer-to-peer file sharing Several papers have analyzed the incentive mecha- application that uses a “tit-for-tat” protocol [2] to trans- nisms proposed by BitTorrent [7, 9], but we are aware fer files between peers. BitTorrent files are broken into of only one empirical study [8], by Izal. This is a use- smaller fragments, which can be downloaded in paral- ful start, but because it focuses on a single shared file lel from multiple peers. Although not a “pure” tit-for-tat the scope of its observations are limited. We hope to protocol [9], BitTorrent is designed to reward peers for broaden the understanding of BitTorrent by presenting uploading fragments to others with proportionally fast results from several weeks of logging 35 BitTorrent sites downloads. during February and March of 2006. These sites serve 2.5 Seeders Leechers o i t a 10000 R 2 s r 8000 e h c 1.5 e e s 6000 r L - e o e 1 t - P 4000 s r e 0.5 2000 d e e 0 S 0 1 49 97 145 193 241 289 337 385 433 481 529 Pre-Offline Offline Post-Offline Hour Figure 2: The Effect of LMP Server Downtime Figure 1: Hourly Behavior at elm-project.org 3 Study Results To download a file, peers must first have an associated .torrent file. .torrent files are usually obtained We periodically logged the number of seeders and through BitTorrent web sites and contain important in- leechers within the swarms of 35 trackers over sev- formation about the download target, such as its length, eral weeks in February and March of 2006. Each en- its name, fragments' hashing information and the URL try was an aggregation across all swarms connected to of a tracker. The tracker is a server that maintains infor- each tracker. We obtained this information by cooperat- mation about where fragments can be downloaded from. ing with tracker administrators as well as monitoring the A group of peers actively downloading and uploading statistics posted on BitTorrent web sites. Depending on fragments is a swarm. the tracker, we were able to log statistics on a daily or Using the .torrent file, clients can query the hourly basis. The first half of our study examines the re- tracker about the locations of their missing fragments lationship among available bandwidth, download rates, within the swarm. Clients request fragments directly and the composition of the swarm in a Linux distribution from other peers and can increase the priority of their network over three weeks. requests by offering to upload the fragments still needed All of the BitTorrent trackers we monitored had an as- by those peers. This creates an incentive for clients to sociated web site; tracker URLs within the .torrent download the rarest fragments first and increases overall files hosted by a web site referred to servers controlled fragment availability. by web site administrators. This close relationship al- Of course, when a client first enters the swarm, it has lowed us to associate web sites' attributes with the nothing to offer anyone else. Thus, clients must be able swarm statistics we logged. Some of the attributes we to download fragments “for free” from seeders. Seeders recorded include the nature of files’ content, whether already have complete files and expect nothing in return sites provided message boards, or if registration was for their uploading. required to access .torrent files. The second half Seeders are also critical for BitTorrent performance. of our study looks at the relationship between these at- Izal's study found that over five months, seeders ac- tributes and trackers' seeders-to-leechers ratios. counted for over two-thirds of the total upload band- width [8]. Their importance is reflected in the pejorative 3.1 Case Study: The Linux Mirror Project label, leechers, assigned to peers who are still download- ing files. Peers can only reach the status of seeder by up- One of the most interesting sites in our study was the loading without downloading. Importantly, leechers are Linux Mirror Project [10] (LMP). As its name suggests, not necessarily free-riders. Even peers that engage in tit- LMP uses BitTorrent to distribute Linux distributions, for-tat with the rest of the swarm are considered leechers kernels, and other open-source software. We recorded until they have obtained all file fragments. the number of seeders and leechers tracked by LMP ev- The role seeders play in bootstrapping BitTorrent and ery hour for 23 days. These numbers are in Figure 1. network performance makes understanding them an im- Under normal circumstances, LMP peers appear to be portant data point for content-distribution network de- extremely generous, exhibiting a seeders-to-leechers ra- signers and administrators. Game theory has proven use- tio of over two in steady state. By comparison, Izal ful for understanding leechers' behavior [4, 9], but offers observed a peak ratio of approximately .6 [8]. How- little hope of understanding seeders since altruism is not ever, between the 17th and 20th day of the trace, the easily described by these models. Because of this, we swarm size crashed before nearly tripling normal levels. hope that our empirical observations will provide insight More interesting, during this period the ratio of seeders- into how seeders behave. to-leechers approached one and even dipped below one 2500000 1000000 2000000 100000 s 1500000 r e e 10000 P 1000000 s r e 500000 h 1000 c e e 0 L Seeders Leechers 100 Figure 3: Aggregate Seeders and Leechers 10 1 toward the end of this growth. Around the 20th day, the 1 10 100 1000 10000 100000 1000000 network suddenly returned to normal. Seeders Understanding this behavior requires an explanation Figure 4: Average Daily Seeders and Leechers of LMP's architecture. LMP maintains three separate servers: one that acts as a web site and tracker and two that act as seeders. Early in the morning of the 17th day, of altruism increased as load shifted from LMP's servers the web site and tracker went offline because of a prob- onto the swarm. Persistent seeders may have reacted lem with LMP's ISP.