Measuring the Bittorrent Ecosystem: Techniques, Tips, and Tricks
Total Page:16
File Type:pdf, Size:1020Kb
CUEVAS LAYOUT 8/23/11 9:38 AM Page 144 TOPICS IN NETWORK TESTING Measuring the BitTorrent Ecosystem: Techniques, Tips, and Tricks Michal Kryczka, University Carlos III Madrid and Institute IMDEA Networks Rubén Cuevas, Carmen Guerrero, and Arturo Azcorra, University Carlos III Madrid Angel Cuevas, Institut Telecom, Telecom SudParis ABSTRACT far; however, to the best of the authors’ knowl- edge there is no document that compiles, BitTorrent is the most successful peer-to-peer describes, and classifies these techniques. In this application. In the last years the research com- article we first describe the main aspects and munity has studied the BitTorrent ecosystem by functionality of the complete BitTorrent ecosys- collecting data from real BitTorrent swarms tem. Afterward, we present a survey of the exist- using different measurement techniques. In this ing BitTorrent measurement techniques. Finally, article we present the first survey of these tech- we describe the main challenges these tech- niques that constitutes a first step in the design niques face and possible solutions to them before of future measurement techniques and tools for concluding the article. analyzing large-scale systems. The techniques are classified into macroscopic, microscopic, and IT ORRENT COSYSTEM complementary. Macroscopic techniques allow B T E us to collect aggregated information of torrents BitTorrent [5] is the name used by Brian Cohen and present very high scalability, able to monitor to define the peer-to-peer file sharing protocol up to hundreds of thousands of torrents in short he designed a decade ago. Due to the great suc- periods of time. Microscopic techniques operate cess of this protocol, a complex system was cre- at the peer level and focus on understanding ated around it. In this article we adopt the performance aspects such as the peers’ down- terminology used by Zhang et al. [16] to refer to load rates. They offer higher granularity but do this complex system as the BitTorrent ecosystem. not scale as well as macroscopic techniques. In this section we describe the main players of Finally, complementary techniques utilize recent this ecosystem as well as its functionality. This is extensions to the BitTorrent protocol in order to summarized in Fig. 1. obtain both aggregated and peer-level informa- tion. The article also summarizes the main chal- DESCRIPTION OF lenges faced by the research community to BITTORRENT FUNCTIONAL ELEMENTS accurately measure the BitTorrent ecosystem such as accurately identifying peers and estimat- •A BitTorrent portal is a server into which ing peers’ upload rates. Furthermore, we provide content publishers upload .torrent files and Bit- possible solutions to address the described chal- Torrent clients download those .torrent files. lenges. •A BitTorrent swarm is formed by a set of peers downloading a given content using the Bit- INTRODUCTION Torrent protocol. •A BitTorrent tracker is a server that main- BitTorrent [5] is the most successful peer-to- tains a list of clients forming the BitTorrent peer file sharing application. Indeed, it is respon- swarm associated with given content. Further- sible for a major portion of the Internet traffic more, the tracker is aware of the download share [10] and is used daily by dozens of millions progress of each peer within the swarm. of users. This has attracted the interest of the •A BitTorrent client or peer is an entity that research community, which has thoroughly eval- participates in a BitTorrent swarm by download- uated the performance and demographic aspects ing and/or uploading pieces of the content. Two of BitTorrent. Due to the complexity of the sys- categories of peers may be distinguished: A seed- tem, the most relevant studies have tried to er is a client that has a complete copy of the con- understand different aspects by performing real tent, and thus only uploads pieces to other peers. measurements of BitTorrent swarms in the wild, A leecher is a client that does not have a com- that is, inferring information from real swarms in plete copy of the content, and thus uploads and real time. downloads pieces to and from other peers, Several techniques have been used in order respectively. to measure different aspects of BitTorrent so •A .torrent file is a meta-information file asso- 144 0163-6804/11/$25.00 © 2011 IEEE IEEE Communications Magazine • September 2011 CUEVAS LAYOUT 8/23/11 9:38 AM Page 145 ciated with content shared through BitTorrent. The .torrent file includes the following informa- User connects to tracker using hash tion: content name, file size, number and size of Public bit torrent of the swarm from .torrent file. Tracker identifies swarm and responds the pieces that form the content (named chunks), with random pool of IP’s which torrent infohash (an identifier that uniquely Tracker participate in swarm. identifies the swarm associated to the .torrent file) and IP address(es) of the Tracker(s) man- aging the swarm associated to the file. Swarm UBLISHING ONTENT IN IT ORRENT Seeder Leecher [82%] Leecher [21%] Leecher [0%] P C B T [100%] In order to make available content C in Bit- Torrent, the content publisher creates a .tor- C User connects with rent file associated with . After creating the peers and downloads/ .torrent file, the content publisher uploads it to uploads chunks User User downloads Bit torrent a BitTorrent portal. A detailed analysis of the .torrent file portal BitTorrent content publishing phenomenon connected with content he is can be found at [6]. There are a few BitTorrent interested in. portals such as The Pirate Bay1 indexing mil- lions of torrents and receiving millions of daily Figure 1. BitTorrent Ecosystem basic functionality: i) the BitTorrent client con- visits. These portals are critical for the BitTor- tacts a BitTorrent portal to download the .torrent file associated to the desired rent ecosystem as demonstrated by Zhang et al. content (the .torrent file includes the IP address of the Tracker managing the [16]. They offer detailed information regarding swarm associated to the desired content); ii) the BitTorrent client contacts the each indexed torrent. This information slightly Tracker that provides the IP addresses of a set of peers within the swarm; and varies from one portal to another, but in gen- iii) the BitTorrent client connects to these peers for downloading the content. eral it includes category of the content, num- ber of associated files, size of the whole content in the torrent, complete name of the file, BITTORRENT DELIVERY PROCEDURE uploading date, username who uploaded the torrent, number of seeders and leechers partic- In BitTorrent two peers communicate using the ipating in the torrent swarm (this data is updat- peer wire protocol. Every communication starts ed every few minutes), and a description text with an initial handshake. Just after the handshake giving more detailed information regarding the sequence is completed (and before any other mes- content. Figure 2 shows an example of a tor- sages are sent), the peers exchange bitfields by rent web page from the Pirate Bay portal. using a BITFIELD message. The bitfield indicates Finally, it is worth noting that some of these which chunks of the file a peer has already down- major portals offer a Really Simple Syndica- loaded. Furthermore, every time a peer gets a new tion (RSS) feed to announce new published chunk, it informs its neighbors by using a HAVE torrents. message. Hence, every peer is aware of the chunks each neighbor has at any moment. JOINING A BITTORRENT SWARM AND BitTorrent uses Tit-for-Tat as the incentive DISCOVERING PEERS model for the delivery mechanism; basically, each leecher uploads chunks to those other leechers When a BitTorrent user wants to download a from whom it is downloading more chunks. The given content C, it looks for the .torrent file Choking Algorithm is responsible for providing associated with C in a BitTorrent portal and this behavior. It is a periodical operation where downloads it. The .torrent file can be opened every 10 s a leecher selects (unchokes) n other with any of the existing BitTorrent clients.2 leechers from its neighborhood to upload chunks Upon opening the .torrent file, the BitTorrent to. These n (typically 4) unchoked leechers are client connects to one of the trackers included in those from whom the peer downloaded more this one. A new peer first contacts the tracker chunks during the last 20 s. The rest of the using an announce started request that is neighbors are blocked (choked). In the case of a answered by the tracker with the number of seeder, it unchokes n (typically 4) leechers to seeders and leechers participating in the swarm whom more chunks it uploaded in the last 20 s along with the IP addresses of N (between 40 (i.e., those with higher download rate).4 In addi- 1 This is currently the and 200) randomly selected peers. These N tion to the regular unchoke operation, BitTor- largest BitTorrent portal peers form the initial neighborhood of the new rent implements the optimistic unchoke based on Alexa ranking. node. Furthermore, if a peer’s neighborhood operation. Every 30 s (i.e., every 3 regular size falls below a given threshold (typically 20), it unchoke operations) both leechers and seeders 2 http://en.wikipedia.org/ again sends an announce started request to the randomly select one choked neighbor to upload wiki/Comparison_of_ tracker in order to get new neighbors. Finally, chunks to. Finally, when a leecher is unchoked BitTorrent_clients when a peer leaves the swarm, it sends an by a neighbor, it applies the Rarest First Policy in announce stopped request to the tracker that order to choose which chunk to request of this 3 http://www.openbittor- removes this peer from the list of participants in neighbor.