Measuring Bittorrent Ecosystem: Techniques, Tips and Tricks
Total Page:16
File Type:pdf, Size:1020Kb
Measuring BitTorrent Ecosystem: Techniques, Tips and Tricks Michal Kryczka∗y, Ruben´ Cuevas∗, Angel´ Cuevas∗, Carmen Guerrero∗ and Arturo Azcorra∗ ∗University Carlos III Madrid yInstitute IMDEA Networks Abstract—BitTorrent is the most successful peer-to-peer ap- ago. Due to the great success of this protocol a complex system plication. In the last years the research community has studied was created around it. In this paper we adopt the terminology the BitTorrent ecosystem by collecting data from real BitTorrent used by Zhang et al. [16] to refer to this complex system as swarms using different measurement techniques. In this paper we present the first survey of these techniques that constitutes a first the BitTorrent Ecosystem. In this section we describe the main step in the design of future measurement techniques and tools players of this ecosystem as well as its functionality. This is for analyzing large scale systems. The techniques are classified summarized in Fig. 1. into Macroscopic, Microscopic and Complementary. Macroscopic techniques allow to collect aggregated information of torrents A. Description of BitTorrent Functional Elements and present a very high scalability being able to monitor up to hundreds of thousands of torrents in short periods of time. • A BitTorrent Portal is a server into which content Rather, Microscopic techniques operate at the peer level and publishers upload .torrent files and BitTorrent clients focus on understanding performance aspects such as the peers’ download those .torrent files. download rates. They offer a higher granularity but do not scale • A BitTorrent Swarm is formed by a set of peers down- as well as the Macroscopic techniques. Finally, Complementary loading a given content using the BitTorrent protocol. techniques utilize recent extensions to the BitTorrent protocol in order to obtain both aggregated and peer level information. The • A BitTorrent Tracker is a server that maintains a list paper also summarizes the main challenges faced by the research of clients forming the BitTorrent swarm associated to a community to accurately measure the BitTorrent ecosystem given content. Furthermore the Tracker is aware of the such as accurately identifying peers or estimating peers’ upload download progress of each peer within the swarm. rates. Furthermore, we provide possible solutions to address the • A BitTorrent Client or Peer is an entity that participates described challenges. in a BitTorrent swarm by downloading and/or uploading I. INTRODUCTION pieces of the content. Two categories of peers may be BitTorrent [5] is the most successful peer-to-peer file- distinguished: A seeder is a client that has a complete sharing application. Indeed, it is responsible for a major copy of the content, thus only uploads pieces to other portion of the Internet traffic share [10] and is daily used by peers. A leecher is a client that does not have a complete dozens of millions of users. This has attracted the interest copy of the content, thus uploads and downloads pieces of the research community that has thoroughly evaluated the to and from other peers respectively. performance and the demographic aspects of BitTorrent. Due • A .torrent file is a meta-information file associated to to the complexity of the system, the most relevant studies have a content shared through BitTorrent. The .torrent file tried to understand different aspects by performing real mea- includes the following information: content name, file surements of BitTorrent swarms in the wild, this is inferring size, number and size of the pieces that form the con- information from real swarms in real time. tent named chunks, torrent infohash (an identifier that Several techniques have been used in order to measure uniquely identifies the swarm associated to the .torrent different aspects of BitTorrent so far, however to the best of file) and IP address(es) of the Tracker(s) managing the the authors knowledge there is no document that compiles, swarm associated to the file. describes and classifies these techniques. In this paper we firstly describe the main aspects and functionality of the com- B. Publishing Content in BitTorrent plete BitTorrent Ecosystem in Section II. Afterwards, Section In order to make available a content C in BitTorrent, the III presents a survey of the existing BitTorrent measurement content publisher creates a .torrent file associated to C. After techniques. Finally, we describe the main challenges that these creating the .torrent file, the content publisher uploads it to a techniques face and the possible solution to them in Section BitTorrent portal. A detailed analysis of the BitTorrent content IV before concluding the paper in Section V. publishing phenomenon can be found at [6]. There are a few BitTorrent portals such as The Pirate Bay1 indexing millions II. BITTORRENT ECOSYSTEM of torrents and receiving millions of daily visits. These portals BitTorrent [5] is the name used by Brian Cohen to define the peer-to-peer file-sharing protocol that he designed one decade 1This is the current largest BitTorrent portal based on Alexa Ranking. 2 Fig. 1: BitTorrent Ecosystem basic functionality: (i) The BitTorrent client contacts a BitTorrent portal to download the .torrent file associated to the desired content (the .torrent file includes the IP address of the Tracker managing the swarm associated to the desired content); (ii) The BitTorrent client contacts the Tracker that provides the IP addresses of a set of peers within the swarm; (iii) The BitTorrent client connects to these peers for downloading the content. are critical for the BitTorrent ecosystem as demonstrated by the Tracker that removes this peer from the list of participants Zhang et al. [16]. They offer detailed information regarding in the swarm. each indexed torrent. This information slightly varies from one It is worth noting that, in practice, the BitTorrent Ecosystem portal to another, but in general it includes: category of the relies in few Trackers that manage a large number (up to a content, number of associated files, size of the whole content few millions) of torrents in parallel. The OpenBitTorrent and in the torrent, complete name of the file, uploading date, PublicBitTorrent Trackers3 are currently the most important username who uploaded the torrent, number of seeders and ones. leechers participating in the torrent swarm (this data is updated D. BitTorrent Delivery Procedure every few minutes) and a description text giving more detailed information regarding the content. Fig. 2 shows an example In BitTorrent two peers communicate using the peer wire of a torrent web-page from the Pirate Bay portal. Finally, it is protocol. Every communication starts with an initial hand- worth noting that some of these major portals offer a Really shake. Just after the handshaking sequence is completed (and Simple Syndication (RSS) feed to announce the new published before any other messages are sent) the peers exchange the torrents. bitfields by using a BITFIELD message. The bitfield indicates which chunks of the file a peer has already downloaded. Furthermore, every time a peer gets a new chunk, it informs to C. Joining a BitTorrent Swarm and Discovering Peers its neighbours by using a HAVE message. Hence, every peer is When a BitTorrent user wants to download a given content aware of the chunks that each neighbour has at any moment. C, it looks for the .torrent file associated to C in a BitTorrent BitTorrent uses the Tit-for-Tat as incentive model for the portal and downloads it. The .torrent file can be opened delivery mechanism; basically each leecher uploads chunks with any of the existing BitTorrent clients2. Upon opening to those other leechers from whom it is downloading more the .torrent file, the BitTorrent client connects to one of the chunks. The Choking Algorithm is responsible for providing Trackers included in this one. A new peer first contacts the this behaviour. It is a periodical operation where every 10 Tracker using an announce started request that is answered seconds a leecher selects (unchoke) n other leechers from by the Tracker with the number of seeders and leechers its neighbourhood to upload chunks to. These n (typically 4) participating in the swarm along with the IP addresses of N unchoked leechers are those from whom the peer downloaded (between 40 and 200) randomly selected peers. These N peers more chunks during the last 20 seconds. The rest of the form the initial neighbourhood of the new node. Furthermore, neighbours are blocked (choked). In the case of a seeder, if a peer’s neighbourhood size falls below a given threshold it unchokes n (typically 4) leechers to whom more chunks (typically 20), it sends again an announce started request to it uploaded in the last 20 seconds (i.e., those with higher the Tracker in order to get new neighbours. Finally, when a download rate)4. In addition to the regular unchoke operation, peer leaves the swarm, it sends an announce stopped request to 3www.openbittorrent.org and www.publicbt.org 4Note that different BitTorrent clients may implement different variations 2http://en.wikipedia.org/wiki/Comparison of BitTorrent clients of the explained unchoke algorithms. 3 Property Portal Crawling Tracker Crawling Peers Crawler Own client/plugin Category Macroscopic Macroscopic Microscopic Microscopic Type of Information Torrents Demographics and High level performance Peer Level Performance Peer Level Performance Cost of Crawler preparation Low Medium High High Scalability Very High High Medium Medium-Low Obtained details Basic Medium Advanced Very Advanced Completeness of Torrent Population - High Very High Low TABLE I: Comparison of main BitTorrent measurement techniques BitTorrent implements the optimistic unchoke operation. Every We classify the Macroscopic techniques into two subcat- 30 seconds (this is, every 3 regular unchoke operations) both egories: BitTorrent portals crawling and BitTorrent Trackers leechers and seeders randomly select one choked neighbour to crawling. upload chunks to. Finally, when a leecher is unchoked by a neighbor, it applies the Rarest First Policy in order to choose -BitTorrent portals crawling: which chunk to request to this neighbor.