Monitoring Bittorrent Swarms Dissertação Para a Obtenção De
Total Page:16
File Type:pdf, Size:1020Kb
Monitoring BitTorrent Swarms António Manuel Rebelo Alves Homem Ferreira Dissertação para a obtenção de Grau de Mestre em Engenharia de Redes de Comunicações Júri Presidente: Prof. Doutor Paulo Jorge Pires Ferreira Orientador: Prof. Doutor Ricardo Jorge Feliciano Lopes Pereira Co-orientador: Prof. Doutor Fernando Henrique Corte Real Mira da Silva Vogais: Prof. Doutor João Coelho Garcia September 2011 ii Acknowledgments To all those who helped me and supported me over this journey, from teachers to collegues, friends to family, i thank you all. iii iv Resumo Os protocolos Peer-to-Peer, especialmente o BitTorrent, sao˜ responsaveis´ por uma grande maioria do trafego gerado na Internet, tendo um grande impacto sobre o trafego´ inter-ISP e, consequentemente, nos custos de peering dos ISPs. Atraves´ da monitorizac¸ao˜ de mais de 3200 enxames reais num ambiente de Internet, descobrimos que existe uma grande quantidade de localidade que pode ser explorada e utilizada para diminuir o trafego´ inter-ISP. Discutimos a relac¸ao˜ entre o tamanho do enxame, a popularidade do conteudo´ e a localidade existente e demonstramos que, mesmo enxames de pequeno tamanho temˆ propriedades de localidade. Tambem´ observamos´ que existem enxames que partilham conteudo´ espec´ıfico de uma regiao,˜ demonstrando uma elevada localidade. Durante a experienciaˆ tambem´ descobrimos que existe uma quantidade de conteudo´ repetido a ser partilhado na rede. Varios´ peers temˆ a tendenciaˆ de publicar o mesmo conteudo´ atraves´ de ficheiros de torrent diferentes, criando varios´ enxames independentes que acabam a partilhar um grande numero´ de partes comuns. Esta redundanciaˆ pode ser explorada de forma a aumentar a disponibilidade dos dados e a diversidade de origem dos mesmos, bem como a localidade existente. Para explorar esta redundancia,ˆ propomos uma nova tecnica´ com o nome de Partial Swarm Merger, que adiciona um novo componente a` infra-estrutura BitTorrent, permitindo que os peers possam descobrir outros enxames que partilhem conteudo´ comum. Com esta informac¸ao,˜ os diferentes peers podem participar nos diferentes enxames, anunciando e solicitando de cada enxame as partes em comum com o seu download. Desta forma, a disponibilidade das partes em comum aos varios´ enxames, aumentara.´ Palavras-chave: BitTorrent, Peer-to-Peer, Monitorizac¸ao,˜ Localidade, Conteudo´ repetido, Disponibilidade v vi Abstract Peer-to-Peer protocols, specially BitTorrent, account for most traffic generated in the Internet, having a great impact on inter-ISP traffic and thus ISPs’ peering costs. However, through locality mechanisms, P2P traffic can be contained close by in the network and even in the same ISP, decreasing inter-ISP traffic. Through the monitoring of over 3200 live Internet swarms, we found that there is a lot of locality that can be exploited and used to decrease inter-ISP traffic. We discuss the relationship between the swarm size, content’s popularity and the existing locality and find that even small swarms have some locality properties. We also observed swarms sharing content specific to a region and thus showing a great amount of locality. During the experiment we also discovered that there is a significant amount of repeated content being shared. Various publishers tend to publish the same content through different torrent files, creating independent swarms that end up sharing a large number of common parts. This redundancy can be exploited in order to increase data availability and source diversity, as well as the existing locality. To deal with this redundancy, we propose a novel technique, called Partial Swarm Merger, which adds a new component to the BitTorrent infrastructure, allowing peers to learn about swarms with common content. With this information, peers could combine the different swarms, announcing and requesting from each swarm the pieces in common with their download. This will increase the availability of the parts which are common to the several swarms. Keywords: BitTorrent, Peer-to-Peer, Monitoring, Measurements, Locality awareness, Repeated content, Availability vii viii Contents Acknowledgments........................................... iii Resumo.................................................v Abstract................................................. vii List of Tables.............................................. xi List of Figures............................................. xiv 1 Introduction 1 1.1 The Problem...........................................1 1.2 Work description.........................................2 1.3 Structure of this thesis......................................3 1.4 Publications............................................3 2 BitTorrent Protocol 5 3 State of the art 9 3.1 Locality Solutions.........................................9 3.1.1 Locality through Client..................................9 3.1.2 Locality through peer and ISP cooperation....................... 11 3.1.3 Locality through ISP alone............................... 12 3.1.4 Comparison between solutions............................. 14 3.2 Locality studies.......................................... 14 3.2.1 Studies of BitTorrent’s locality.............................. 14 3.2.2 Comparison between related studies.......................... 16 3.3 Content availability........................................ 17 4 Methodology for gathering and analysing data 21 4.1 System architecture....................................... 21 4.2 Data analysis methodology................................... 23 5 Results 25 5.1 Content Analysis......................................... 25 5.1.1 Content pollution..................................... 25 5.1.2 Content repetition.................................... 26 ix 5.2 Locality Analysis......................................... 31 5.2.1 Repeated content.................................... 31 5.2.2 All content......................................... 33 5.2.3 Regional content..................................... 36 5.2.4 Large swarms...................................... 38 5.2.5 Two-hour period..................................... 38 5.3 Peer and Tracker behavior.................................... 41 5.3.1 Tracker behavior..................................... 41 5.3.2 Peer behavior....................................... 42 5.4 Summary............................................. 44 6 Partial Swarm Merger 47 6.1 PSM................................................ 47 6.2 Use Case............................................. 49 7 Conclusions and future work 51 7.1 Conclusions............................................ 51 7.2 Future Work............................................ 52 Bibliography 56 x List of Tables 3.1 Comparison between some solutions already implemented.................. 15 3.2 Comparison between studies to the locality potential in BitTorrent.............. 18 5.1 Torrent aggregation benefits................................... 32 xi xii List of Figures 2.1 (1) Obtaining peers from tracker to be able to join the swarm, (2) exchanging data with other peers and (3) obtaining more active peers on the swarm................6 4.1 Work flow of the system...................................... 22 5.1 CDF with the percentage of unique pieces for each torrent file................ 26 5.2 Maximum swarm size and number of seeders for all swarms representing pollution with a maximum swarm size value above 50 peers.......................... 26 5.3 Time, in hours, swarms sharing polluted content take to drop to 20% of their maximum size................................................. 27 5.4 CDF with the shared percentage of pieces per number of torrent pairs with at least one common piece.......................................... 27 5.5 CDF with the shared MegaBytes per number of torrent pairs with at least one common piece................................................ 28 5.6 Number of torrents published by team............................. 29 5.7 Histogram of the content repetition frequency......................... 29 5.8 Average number of peers per content.............................. 30 5.9 Average number of seeders per content............................. 30 5.10 Number of peers per country for similar content........................ 31 5.11 Number of peers per ISP for similar content........................... 32 5.12 Increase in swarm size at each ISP by aggregating similar content.............. 33 5.13 CDF with the content size for all content considered as being pollution............ 34 5.14 Average number of peers obtained and average number of peers per country for the one- day data aggregation period................................... 34 5.15 Average number of peers obtained and average number of peers per ISP for the one-day data aggregation period...................................... 35 5.16 Distribution of the percentage of the median number of peers that belong to the same country or ISP for the one-day data aggregation period.................... 35 5.17 Countries with an average above 30 peers per day for a regional torrent........... 36 5.18 ISPs with an average above 10 daily peers for a regional torrent............... 36 xiii 5.19 Regional torrents with 60% of all peers belonging to the same country, at least 75% of the times................................................ 37 5.20 Regional torrents with 30% of all peers belonging to the same ISP, at least 75% of the times. 37 5.21 Torrent size and maximum number of seeders for swarms with a maximum number of seeders above 5000.......................................