Characterizing Peer-Level Performance of Bittorrent
Total Page:16
File Type:pdf, Size:1020Kb
Characterizing Peer-level Performance of BitTorrent DRP Report Amir H. Rasti ABSTRACT 1. INTRODUCTION BitTorrent is one of the most popular Peer-to-Peer During the past few years, peer-to-peer appli- (P2P) content distribution applications over the cations have become very popular on the In- Internet that significantly contributes in network ternet. BitTorrent in one of the most popu- traffic. lar peer-to-peer applications providing scalable peer-to-peer content distribution over the In- In BitTorrent, a file is divided into segments ternet. Some recent studies [8] have shown that and participating peers contribute their outgoing BitTorrent is accountable for approximately bandwidth by providing their available segments 35% of the Internet traffic. BitTorrent is a to other peers while obtaining their missing peers scalable peer-to-peer content distribution sys- from others. Characterization of BitTorrent is use- tem that enables one-to-many distribution of ful in determining its performance bottlenecks as large files without requiring a large access link well as its impact on the network. bandwidth at the source. Similar to other peer- In this study, we try to address the following two to-peer systems, it uses resources of participat- key questions through measurement: (i) What are ing peers to increase the capacity of the system. the main factors that affect observed performance The main shared resource in BitTorrent is the by individual peers in BitTorrent?, and (ii) What up-link bandwidth of individual peers. The file are the contributions of these factors on the per- being distributed is divided into a large number formance of individual peers? To address these of segments. The source provides different seg- questions, first we examine the group-level and ments of the content to different peers. Partic- peer-level characteristics of BitTorrent using three ipating peers connect to each other to form an tracker logs from different sources. Second, we use overlay and peers exchange the available seg- statistical analysis (namely rank correlation and ments until they have downloaded the entire linear regression) to determine and quantify the content. potential effects of both peer-level and group-level properties on the performance of individual peers. This approach enables participating peers to contribute their outgoing bandwidth which We conclude that: (i) There is no single prop- makes BitTorrent a scalable system. Peers erty that has dominant effect on the observed per- may stay in the system once their download- formance by individual peers, (ii) Outgoing band- ing is complete acting as a seed. This be- width of each peer, average available content in havior whether intentionally or not, makes the the group and churn rate appear to have the most available resources in the system richer and im- notable effect on peer performance and (iii) The proves its robustness to system dynamics. behavior of the system in practice is rather com- plex due to the inherent dynamics in peer partic- BitTorrent is widely used for the distribu- ipation and content delivery as well as bandwidth tion of free software such as Linux as well as heterogeneity and asymmetry. copies of the latest movies and music albums. 1 It is believed to successfully utilize participat- (i) What are the dominant key peer or group ing peers' resources quickly and to be able properties that are more likely to determine the to handle flash crowds with acceptable perfor- observed performance by individual peers? (ii) mance [9](i.e., download time) despite the dy- What is the contribution of each property and namics of peer participation. It is also believed how we can quantify that? that BitTorrent's \tit-for-tat" is a successful in- To answer these questions, we examine the centive mechanism singling out free riders. observed performance by participating peers in The goal of individual peer is to maximize three popular torrents using BitTorrent tracker their download rates by finding high speed up- logs. We identify different challenges in (i) esti- loaders and keeping them satisfied by provid- mating peer and group properties from BitTor- ing them with a good upload rate. However, rent tracker logs, (ii) dealing with large data the extent to which each peer is successful in set and (iii) identifying and coping with vari- achieving this goal or its observed performance ous errors and bugs in a data set. We also ex- is the subject of this project. We believe that plain why assessing the observed performance the performance a peer will observe, potentially by individual peers is not trivial. depends on two sets of parameters: group prop- Using the derived peer and group properties erties and peer properties. The first set of pa- for three data sets, we conduct the following rameters capture the group status from differ- analysis. First, we present evolution of group ent aspects and are defined as a function of properties over time which illustrates existing time: (i) group population, (ii) peer arrival and dynamics in a torrent. The evolution usually departure rates (churn), (iii) average content includes an initial flash crowd pattern followed availability among participating peers and (iv) by a long time in which the system experiences percentage of seeds are the main group prop- rather stable conditions. Second, we show the erties that might affect a peer's performance. distribution of peer properties and observed The second set are the properties of the peer group properties during life time of individual itself regardless of the group. The peer's incom- peers across participating peers in a torrent. ing and outgoing bandwidth, geographical loca- These results illustrate that the observed per- tion, local configuration, ratio of contribution formance could spread over a wide range often (upload over download) are examples of peer with no clear modes or high density regions. properties that might potentially affect peer's performance. To capture peer properties, we Finally, we employ several statistical analy- record a signature of each peer's properties dur- sis techniques to identify key properties that ing its life time using statistical parameters. determine observed performance by each peer: (i) We investigate pairwise statistical correla- While BitTorrent enables a single peer to dis- tion between observed performance and prop- tribute content to a large number of interested erties and find out that the minimum outgoing receivers, it is unclear what factor(s), deter- bandwidth and average group content availabil- mine the observed performance, namely down- ity have the most significant correlation with load rate, by individual peers. the performance. (ii) Using linear regression In this paper, our goal is to empirically an- we show that there is no linear model accu- swer the following basic questions about ob- rately describing the observed performance us- served performance by individual peers in Bit- ing the peer and group properties. However, in Torrent: the generated linear model the median outgo- 2 ing bandwidth and average group content avail- within this group. Download requests may be ability have the most significant effects on the sent to a large list but uploading is controlled model. (iii) Using data mining decision tree by the incentive mechanism and is limited to a technique, we observed that the decision tree small set (5 peers) at any time. generated in all cases is very deep (10 fold) and At any given time, participating peers are large. This implies that there is no common divided into two groups; namely leechers and order of importance among the peer and group seeds. A seed is a peer who has already down- properties determining the peer performance. loaded the whole file and is only helping oth- In a nutshell, our results indicate that: (i) ers to download. Leechers are those still in the there is no single property that determines ob- downloading process. They may have some seg- served peer performance, (ii) peer's outgoing ments of the file downloaded but they are still bandwidth, available content in the group and downloading other segments while contribut- peer arrival/departure rates appear to have the ing their up-link bandwidth to other peers who most effect on peer performance, and (iii) in need the segments that they have. Normally a real world (actually deployed) system, there a peer joins the swarm as a leecher with no are many unexpected effects making the system segments available and then it gradually down- unpredictable and chaotic. loads all segments of the file and eventually be- This report continues in the following order. comes a seed. A seed keeps helping other peers In Section 2, we present an overview of Bit- until it leaves. In some cases, people voluntar- Torrent system explaining common terms and ily seed the content they have already down- describing mechanisms BitTorrent uses. Sec- loaded by staying in the torrent for a longer tion 3, categorizes the related work on BitTor- time or rejoining the torrent at a later time. rent characterization using modeling, simula- Before joining the swarm, the user has to tion and empirical studies. Our methodology, download a meta file called the torrent file. challenges of this project, possible approaches This file contains all the information needed and the specifications of the data set we have to join the swarm and download the content. used, are described in Section 4. In Section 5, The torrent file is usually downloaded from one we present our characterization results. Finally of the popular BitTorrent hosting and index- in Section 6, we provide an analysis of the re- ing web sites but it can also be distributed by sults. other means such as e-mail. Indexing web sites, provide a list of torrents matching a particular 2. BITTORRENT: AN search query which allows users to download OVERVIEW the torrent file associated with the desired con- tent.