Deployment of NAT vs. IPv6 in BitTorrent

Simon Müller Elgg, Switzerland Student ID: 12-715-389 – Communication Systems Group, Prof. Dr. Burkhard Stiller HESIS T

Supervisor: Andri Lareida, Thomas Bocek ACHELOR

B Date of Submission: December 5, 2016

University of Zurich Department of Informatics (IFI) Binzmühlestrasse 14, CH-8050 Zürich, Switzerland ifi Bachelor Thesis Communication Systems Group (CSG) Department of Informatics (IFI) University of Zurich Binzmühlestrasse 14, CH-8050 Zürich, Switzerland URL: http://www.csg.uzh.ch/ Abstract

Der globale IPv4 Adressraum ist fast komplett ersch¨opft. Dies stellt fur¨ Service Provider (ISPs) eine grosse Herausforderung dar. Der Nachfolger, IPv6, wurde als L¨osung fur¨ dieses Problem entwickelt. Trotzdem bieten viele ISPs noch immer nur IPv4 Verbin- dungen mit Ubergangsl¨ ¨osungen wie Netzwerkadressubersetzung¨ (NAT) an, um das Pro- blem der Adressraumersch¨opfung zu umgehen. In dieser Arbeit wurde VIOLA, ein Messsy- stem fur¨ das BitTorrent-Netzwerk, mit IPv6 Unterstutzung¨ erweitert, um eine einw¨ochige Messung des BitTorrent-IPv6-Netzwerks durchzufuhren.¨ Selbst wenn das Streaming von Video- und Audiodaten mittlerweile den Hauptteil des weltweiten Internetverkehrs aus- machen, so macht der Peer-to-peer Verkehr doch noch einen deutlichen Anteil davon aus. Mit den Messdaten wurde die Frage beantwortet, ob es einen Zusammenhang zwischen dem Anteil an NAT der in einer Region gemessen wird und der Anzahl an verfugbaren¨ IPv6 Adressen gibt. Die Resultate machen deutlich, dass dem nicht so ist. Mit den ge- sammelten Daten wird zudem ein umfassender Uberblick¨ uber¨ das aktuelle Stadium und die von IPv6 im BitTorrent-Netzwerk gegeben.

The global IPv4 address space is almost completely exhausted. This has proven to be a challenge for Internet Service Providers (ISPs). Its successor, IPv6, has been designed as the definitive solution for this problem. Despite that, many ISPs still only offer IPv4 connections using transitional solutions such as Network Address Translation (NAT) to circumvent the problem of a depleting address pool. In this thesis, VIOLA, a measurement tool for overlay networks such as BitTorrent, was extended with IPv6 support, in order to conduct a one week measurement of the BitTorrent IPv6 network. Even if media streaming is causing the main part of Internet traffic today, Peer-to-peer traffic still accounts for a sizeable portion of the total Internet traffic. The gathered data was used to answer the question, if there is a correlation between the amount of NAT used in an area and the number of IPv6 addresses deployed. The results suggest that there is no such correlation. Additionally, the gathered data is visualized and a comprehensive overview of the state of IPv6 and its evolution in BitTorrent is given.

i ii Acknowledgments

Firstly I want to thank Prof. Dr. Burkhard Stiller and my supervisors Andri Lareida and Thomas Bocek for the opportunity to write a bachelor thesis in the very interesting field of peer-to-peer networks. I am very thankful for Andri Lareida’s continued support while writing this thesis. I also would like to thank my family that helped proofread this thesis despite its technical character.

iii iv Contents

Abstract i

Acknowledgments iii

1 Introduction 1

1.1 Motivation...... 1

1.2 Description of Work ...... 2

1.3 ThesisOutline...... 2

2 Background & Related Work 3

2.1 Background ...... 3

2.1.1 IPv4 Address Exhaustion ...... 3

2.1.2 IPv6...... 4

2.1.3 Transitional Technologies ...... 5

2.1.4 BitTorrent...... 8

2.1.5 Ttorrent ...... 10

2.2 Related Work ...... 10

2.2.1 VIOLA ...... 10

2.2.2 Other Measurement Studies ...... 11

v vi CONTENTS

3 VIOLA Extension 13

3.1 Adding IPv6 Support to the Ttorrent Client ...... 13

3.1.1 HTTP Tracker Support ...... 13

3.1.2 UDP Tracker Support ...... 15

3.2 VIOLA...... 16

3.2.1 General Extensions ...... 16

3.2.2 Slave Extensions ...... 18

3.2.3 Master Extensions ...... 19

4 Measurement 21

4.1 Preparation ...... 21

4.1.1 Master Server ...... 21

4.1.2 SlaveServers ...... 22

4.2 Measurement Process ...... 22

4.3 DataExtraction...... 23

4.3.1 NAT Approximation ...... 23

4.3.2 Scatterplot Data Preparation ...... 24

4.3.3 Choropleth Map Data Preparation ...... 25

5 Analysis Results 27

5.1 General Measurement Data ...... 27

5.2 Global IPv6 Distribution ...... 28

5.2.1 Evolution of IPv6 ...... 30

5.3 Global NAT Distribution ...... 31

5.4 Correlation of NAT and IPv6 ...... 32

5.5 DataLimitations ...... 34

6 Conclusion & Future Work 35

6.1 FutureWork...... 35 CONTENTS vii

Bibliography 37

Abbreviations 41

List of Figures 43

List of Tables 45

A VIOLA Configuration Example 47

B Contents of the CD 49 viii CONTENTS Chapter 1

Introduction

1.1 Motivation

Video streaming services such as Netflix are causing the main part of internet traffic today and the Peer-To-Peer (P2P) traffic caused by file sharing applications gets proportionally smaller and smaller. But P2P traffic still accounts for a sizeable portion of the total In- ternet traffic and will do so for the next few years [10]. Internet Service Providers (ISPs) have to face multiple challenges including network congestion during peak traffic hours or an exhaustion of the Internet Protocol Version 4 (IPv4) address pool. Overlay net- works used by file sharing applications, such as BitTorrent (BT) [4], add to the network congestion, because they are typically not aware of the underlying physical network. The VIOLA project [36] is extracting new data to help develop and evaluate already proposed solutions. The rapidly approaching depletion of the IPv4 address space has caused ISPs to deploy transitional solutions such as Carrier Grade NAT (CGN) in their networks. The introduc- tion of CGNs causes problems for a multitude of applications, including P2P applications such as BT. Therefore, users affected by NAT have an incentive to use Internet Protocol Version 6 (IPv6), which has been introduced as the definitive solution to the problem of IPv4 address exhaustion. Still, today, IPv6 has not been rolled out on a global scale. ISPs have just started offering native IPv6 connections coupled with CGN for legacy con- nectivity to the IPv4 Internet as those two protocols are not compatible with each other. Because CGN is a technology used by ISPs that are in danger of running out of their assigned address space, these ISPs have an incentive to change their network to IPv6 as fast as possible. This thesis tries to establish if there is a correlation between the number of NATed addresses and the number of IPv6 addresses in the BT network. Additionally, the data resulting from such a measurement gives a comprehensive overview of the global state of IPv6 in the BT network.

1 2 CHAPTER 1. INTRODUCTION 1.2 Description of Work

In this thesis, VIOLA is extended to support IPv6 connections to trackers in order to be able to gather data. While doing this, an efficient storage format for IPv6 addresses is defined and an IPv6 geolocation database is implemented. A measurement of IPv4 and IPv6 peers in the BT network is conducted for a week to cover fluctuations effects of different days of the week. Based on this new data and on older VIOLA data, it is investigated how IPv4 NAT presence and IPv6 usage correlate. Additionally, the status and evolution of IPv6 in the BT network is illustrated and the measurement results visualized on a world map.

1.3 Thesis Outline

The thesis is structured as follows.

Chapter 2 gives a comprehensive background of the technologies and the protocols used in this thesis in order to understand the following extensions and the measurement. Additionally an overview of the related work is given.

Chapter 3 presents the extensions that have been made to VIOLA in order to support IPv6 and describes the challenges that arose during the implementation and the subsequent decisions that have been made.

Chapter 4 shows how the VIOLA IPv6 measurement was conducted. Additionally, it describes how the data was stored and extracted to prepare for the analysis.

Chapter 5 presents the results of the analysis, from the global IPv6 and NAT distribu- tion to the answer to the question of a correlation between NAT and IPv6. Addi- tionally, the limitations of the data are discussed.

Appendix A describes how the additionally implemented VIOLA configuration option to add custom trackers should be used. Chapter 2

Background & Related Work

The necessary background needed to understand this work and an overview over the related work is presented in this chapter. In the Background section, IPv4 and IPv6 are explained along with transitional technologies that arose through the IPv4 address exhaustion. Then, an overview over BitTorrent is given. The related work section covers VIOLA, the basis for this thesis, and introduces various measurements related to this work.

2.1 Background

The technical background that helps understanding specific terms used in this thesis and in the resulting measurements is covered in this section.

2.1.1 IPv4 Address Exhaustion

IPv4 is the basis and one of the core protocols used for accessing the Internet. IPv4 was first standardized in 1981 [50] and was the first Internet Protocol (IP) version that was widely used. Today, it is still the most widely used protocol to route Internet traffic [26] despite the existence of IPv6, the successor to IPv4. The global distribution of IP addresses is overseen by the Internet Assigned Numbers Authority (IANA). The IANA assigns address spaces to the five Regional Internet Reg- istries (RIRs), i.e. AFRINIC in Africa, APNIC in the Asia-Pacific region, ARIN in North America, RIPE NCC in Europe, Russia, the Middle East and Central Asia and LAC- NIC in Latin America. These RIRs in turn distribute their address blocks to Internet service providers in their respective region [47]. IPv4 uses 32 bits to represent addresses, therefore it can provide approximately 4 × 109 addresses. The depletion of this address space has long been anticipated [20] and multiple technologies, such as Network Address Translation (NAT), have been created to address this problem. However, those solutions only delayed the inevitable address exhaustion. The only permanent solution is to move to the successor protocol IPv6.

3 4 CHAPTER 2. BACKGROUND & RELATED WORK

Figure 2.1 shows the IPv4 Address Run-Down Model. It illustrates the global exhaustion of the IPv4 address space. Exhaustion is defined here as the time when the pool of avail- able addresses in each RIR reaches the threshold of no more general use allocations of IPv4 addresses. The Run-Down model shows that the only RIR that has not yet released its last address block is AFRINIC. The first RIR to release its last block of addresses was APNIC in April 2011. APNIC, RIPE NCC and LACNIC have special distribution criteria in place for their last address blocks. The projected exhaustion date for AFRINIC with no additional measures is May 2018.

Figure 2.1: IPv4 Address Run-Down Model illustrating the global exhaustion of the IPv4 address space [27].

2.1.2 IPv6

IPv4 address exhaustion (c.f. Section 2.1.1) lead to the creation of version six of the Internet protocol. IPv6 solves the problem of the ever growing numbers of clients access- ing the Internet by introducing an address space that is 296 times bigger than the IPv4 address space [8]. This is realised by using 128 bits for each address instead of the 32 bits for an IPv4 address. This means that IPv6 is able to provide approximately 340 × 1036 addresses. Although IPv6 has been standardized by the Internet Engineering Task Force (IETF) in 1998 [14], adoption of IPv6 has been very slow despite the dwindling IPv4 address space. Figure 2.2 illustrates the amount of clients using the via a native IPv6 2.1. BACKGROUND 5 connection. The figure clearly shows that ISPs only started to deploy IPv6 on a bigger scale at around the start of 2011. The figure also shows that IPv6 has, as of November 2016, reached a worldwide adoption rate of around 12% to 15% depending on the day of the measurement. The introduction of techniques that allowed ISPs to use their assigned IPv4 address space more efficiently might have been a contributing factor to the slow adoption and are dis- cussed in Section 2.1.3.

Figure 2.2: Measurement of users opening Google via native IPv6 connections [29].

2.1.3 Transitional Technologies

Transitional technologies are in use either because the technology makes the use of IPv4 addresses more efficient or because it provides a means of keeping the end user connected to the IPv4 and the IPv6 Internet at the same time. In the following sections, the transitional technologies relevant to this work are discussed.

CIDR

To understand Classless Inter-Domain Routing (CIDR), one must know about the history of the Internet. The IPv4 address space was originally divided into three network classes, Class A, B and C [20]. These classes each were meant to provide a different number of addresses (224, 216 and 28 respectively) and represented networks of three different sizes. An organization or company requesting address space would be classified into one of these three classes depending on its size. However, this class system was not properly scalable and the IETF soon realised that they would run out of addresses in the Class B network space. In practice, most organizations were assigned to a Class B network as they needed more addresses than the Class C network provided [20]. In the Class B network many 6 CHAPTER 2. BACKGROUND & RELATED WORK address blocks remained unused by the organization as this space provided too many address blocks for most organizations. The IETF proposed CIDR as a short-term solution to this problem. The network classes were abolished and an address assignment system using hierarchical blocks of IP addresses (”prefixes or network numbers”) was in turn introduced. To make clear which part of the IP address should be interpreted as the network number, a suffix denoting the number of significant bits is added to the end of the address in CIDR notation. For example, the address 192.168.2.0/24 identifies the subnet containing all addresses from 192.168.2.1 to 192.168.2.255. CIDR was intended to last around three to five years while a ”more permanent addressing and routing architecture would be designed and implemented” [20]. However, CIDR has been proven to be far more robust than originally thought and is still in use today.

Network Address Translation (NAT)

As it is the case with CIDR, NAT was also designed as a short-term solution [17] and is still in wide use within the IPv4 Internet. NAT combats the shortage of IPv4 addresses by allowing for IP addresses to be shared. This is achieved by translating the source address attached to a package to a globally unique address when it is sent through the router [17]. The address translation is totally transparent. As far as the client sending the packet is concerned, the packet reaches the Internet with the originally specified source address. NAT had the advantage that it could be implemented relatively quickly as it did not require any changes to hosts or routers. However, NAT introduces various other problems into network connections as it is ”taking away the end-to-end significance of an IP address” [17]. There are many different scenarios involving NAT, the most common ones are explained here.

Static NAT Static NAT is used to describe the situation when one private IP address is mapped to one public IP address. The public address is static, which means that Server A in Figure 2.3 would always be accessible with the address 31.10.2 as long as the NAT settings are not changed.

Figure 2.3: Illustration of static NAT (adapted from [46]).

Dynamic NAT A dynamic NAT also maps one private IP address to one public IP address. In contrast to the static NAT, the public IP address might not always be the same each time the connection is started as the public address gets assigned from a dynamic pool of available addresses. This means that in the example in Figure 2.4, Host A might not always have the public address 31.10.1. 2.1. BACKGROUND 7

Figure 2.4: Illustration of dynamic NAT (adapted from [46]).

NAT overloading NAT overloading is also known as Port Address Translation (PAT). NAT overloading is the most efficient use of NAT in terms of saving IP addresses. In this scenario, multiple private IP addresses are mapped to one single public IP only differentiated by their port number. By having different ports, all hosts can still access the Internet despite having the same IP address.

Figure 2.5: Illustration of NAT overloading (adapted from [46]).

Carrier-grade NAT A NAT is called Carrier-Grade NAT (CGN) or Large-Scale NAT (LSN) if NAT overloading is used on an ISP level to connect multiple customers to the Internet with the same IP address. In Figure 2.6 for example, the router in Home A translates the private IP addresses to an address in the shared address space. The shared address space consists of the address block 100.64.0.0/10 which is a space that is not publicly used [55] and is only available in the network between the ISP and the routers of its customers. The same process happens in Home B. The ISP then does a second address translation to connect both those addresses to the Internet using only one single public IP address. This makes it possible to connect a large number of clients to the Internet with a relatively small pool of IP addresses.

Figure 2.6: Simplified illustration of a CGN setup (adapted from [39]). 8 CHAPTER 2. BACKGROUND & RELATED WORK

2.1.4 BitTorrent

BT is a protocol for distributing files over the Internet. The main advantage of using BT instead of a plain HTTP download is that if the same file gets downloaded on multiple endpoints at the same time, the downloaders upload to each other. As this is a scalable system it makes it possible to support a very large number of downloaders with only a modest increase in the load of the file source. [13]. Note that the most official documentation of the BT protocol has been done through a multitude of BitTorrent Enhancement Proposals (BEPs) by the BT community, which is where most of the information used in this section is coming from.

Bencode

BitTorrent uses a special encoding scheme for storing and transmitting some of its data called Bencode [13]. The scheme that bencoding uses is relatively simple and it is able to encode the following data types.

String Strings start with a number prefix signalising length of the following string. Example: 5:hello = ’hello’

Integers Integers are signalised by i, followed by the number and an e at the end. Example: i6e = 6

List A list starts with l, followed by the content of the list (which is also bencoded) and ends with an e. Example: l7:tracker7:addresse = [’tracker’, ’address’]

Dictionary A dictionary starts with d and is followed by its bencoded content while keys and corresponding values are alternating. The end of the dictionary is signified with an e. Example: d4:name3:tom4:cityl4:york3:wilee = {’name’:’tom’,’city’: [’york’, ’wil’]}

Metainfo File

The metainfo file contains the information a BT client needs to download the files asso- ciated with the torrent. It is a small static file that is stored as a .torrent file. The metainfo file is bencoded and consists of two keys. The first is the announce URL pointing to one or multiple trackers, the second is a dictionary containing the information about the exact files that make up the torrent [13]. Sometimes, instead of a metainfo file, a magnet link is used. A magnet link is an alternative way to get to the metainfo. Magnet links contain only an info hash (a hash of the info dictionary included in the .torrent file) and the tracker URLs [25]. With this information, the client is able to connect to the specified trackers and download the file information from other peers. Thus, there is no need to download an intermediate file if magnet links are used. 2.1. BACKGROUND 9

Tracker

The tracker is a service running on a web server responding to requests made by the BT clients. It does not concern itself with the actual transfer of the files, instead it in- forms the announcing peers which other peers are downloading or have already completed downloading the requested file. As no actual files are downloaded from the tracker, the bandwidth needed by this client-tracker communication is minimal and a single tracker is able to serve thousands of peers [16]. Clients can announce to the tracker over multiple protocols. Most common are the UDP tracker protocol and the HTTP tracker protocol [13].

Peer Wire Protocol

If a peer wants to connect to a different peer and download data, it has to do so over the Peer Wire Protocol, which defines the way how peers communicate with each other and exchange data. It does not specify which pieces to request or how to choose peers to download from. The protocol operates over the Transmission Control Protocol (TCP) or the uTorrent Transport Protocol (uTP) [16].

Other Peer Discovery Methods

Besides getting peer information from the tracker, there are also other peer discovery methods available to a BT client. This takes off some of the load on the tracker as the client can already build a huge peer list with only a few initially discovered peers. Some of the methods could even make the tracker itself superfluous. In the following, the most widely used methods are described.

Peer Exchange (PEX) PEX is a peer discovery method that can only be used once an initial list of peers has been gathered via other methods [52]. Neighbouring peers are periodically provided with a list containing newly acquired and recently dropped connections. ”It provides a more up-to-date view of the swarm than most other sources and also reduces the need to query other sources frequently” [52].

Distributed Hash Table (DHT) DHT is used, as the name implies, to store peer in- formation in a hash table over a network in a distributed manner. The BT hash table implementation is based on the DHT [40]. The DHT serves as an alternative peer finding mechanism and even allows for trackerless torrents, where peers are only gathered with the help of the DHT and PEX. The DHT consists of nodes storing peer information and each BT client uses a routing table to store the location of known nodes which persists through client restarts. However, if a client’s routing table is empty, the table has to be filled by connecting to a boot- strapping node. A trackerless torrent would store the address to the bootstrapping node in place of the announce URL of a tracker [38]. Alternatively, the routing table could also get automatically filled while simply downloading a normal torrent and 10 CHAPTER 2. BACKGROUND & RELATED WORK

announcing at a tracker. Once there are entries in the table, no additional means of connecting are needed, as new nodes can be discovered from within the DHT.

Local Service Discovery Local Service Discovery [53] provides a means to find and connect to peers in the same local area network. ”It can be used either as primary peer source for local transfers or to complement other sources which only operate on global unicast addresses” [53].

2.1.5 Ttorrent

Ttorrent [49] is an implementation of the BT protocol in Java and consists of a client, a tracker and a command line tool to interface with the Ttorrent library itself. Torrent is primarily designed for easy implementation into bigger projects but could also be used as a standalone BT client or tracker via the command line tool. Ttorrent is extensively used and modified in this thesis.

2.2 Related Work

An overview over the academic work on which this thesis is based is given in this section. Additionally, measurements that have already been conducted and are similar to the one in this thesis are described.

2.2.1 VIOLA

VIOLA [36] is a ”[...]system to continuously monitor Video Consumption in Overlay Net- works (VIOLA)” [36]. It allows to gather detailed data over time about the BitTorrent network. This is achieved by ”[...]measuring the network in a distributed manner by ”mon- itoring a large number of swarms over an extended period of time” [36]. VIOLA consists of a master and a slave module. The master module is only deployed once, whereas there can be as many slave modules deployed as needed. The master mod- ule collects torrents from various feeds or websites and instructs the slave modules on which torrents to measure. The list of torrents could also be provided manually as an alternative to downloading new torrents automatically. The slave modules connect to the trackers included in the torrents provided from the mas- ter, gather all possible data about the peers in the swarm and transmit this data back to the master module. The master then uses the MaxMind database to find the geolocation of the peer IP address and saves it in an Apache Avro database from where it can be analysed further. Figure 2.7 gives a simplified overview over VIOLA. VIOLA is the main software this work is based on as it is extended and used to gather all relevant data. 2.2. RELATED WORK 11

Figure 2.7: Simplified overview of VIOLA (adapted from [36]).

2.2.2 Other Measurement Studies

Measurements about the state of IPv6 have been continually made since its inception and are available online [43]. These measurements however are very general and do not give insights into specific parts of the Internet such as overlay networks like BT. Many of the studies that have been conducted within the BT network did not specifically measure IPv6 traffic [36]. Therefore, only few measurement studies that have set their focus on IPv6 measurements within the BT network are available. Study [37] presents a comparative analysis of traffic flow behaviours of TCP and uTP in BT. However, uTP is not used in thesis and the measurement in study [37] was not done from within the BT network. The data was obtained from the outside by inspecting packets sent through the measuring system. The data showed that the average flow size over uTP is higher than that over TCP and that the ”successful connecting rate of BitTorrent flows over TCP is lower than that over uTP” [37]. Study [7] measured BitTorrent IPv4 and IPv6 traffic to compare packet traffic features of the two protocols. The measurement took place during approximately one month in 2008. The results indicated that the packet traffic features were similar and ”that the network management of the two traffic types could be handled in the same fashion” [7]. Study [2] conducted a long-term general measurement of IPv6 traffic restricted to a single private BT system deployed over a university campus. The IPv6 user performance was investigated and a measurement over a 10 month period in 2009 and 2010 was conducted. The study found there was a performance gap between users of the private BT network, however it was determined that the gap stems from the difference in user’s bandwidth and the tracker’s operating mechanism and found the difference between the IP protocols to be insignificant to the performance. Measurements by M. Defeche and E. Vyncke ([15] and [16]) resulted in an effort to doc- ument the evolution of IPv6 within the BitTorrent network. Vyncke has since conducted multiple measurements concerning the state of IPv6 in BT. The measurements reveal some interesting statistical data which has been summarised on an interactive world map [54]. In a 2012 measurement, only few native IPv6 addresses were found, while it was 12 CHAPTER 2. BACKGROUND & RELATED WORK observed that in most European countries, at least one percent of their peers could use IPv6. This data makes it possible to draw some conclusions about the evolution of IPv6 in the BT network over time when compared with the measurements resulting from this thesis. The first measurement Defeche and Vyncke conducted over the course of three months in July 2009. There have been a lot of follow-up measurements by Vyncke, but they were only conducted over the course of one day. The last such measurement has been done in June 2014. Chapter 3

VIOLA Extension

To even be able to start with any IPv6 measurements in the BT network, VIOLA had to be extended to support connections to IPv6 trackers. This chapter describes how this support was implemented, first in the underlying library, Ttorrent, then in VIOLA itself. The source code is available via a fork from the original VIOLA repository.

3.1 Adding IPv6 Support to the Ttorrent Client

As of October 2016, the implementation of Ttorrent used in VIOLA did not support IPv6. As IPv6 BT support is needed to measure IPv6 peers in this thesis, it had to be implemented manually. VIOLA makes use of Ttorrent in its slave module and as such only uses the client functionality of Ttorrent. Thus, the extensions that have been made and are described in the following sections only modified the client part of the Ttorrent library. The tracker functionality of Ttorrent was not touched in any way.

Ttorrent is able to use two different protocols to contact BT trackers, HTTP and UDP. IPv6 support was added to both the HTTP and the UDP protocol. The following two sections describe the implementation in detail.

3.1.1 HTTP Tracker Support

BT officially supports IPv6 for HTTP connections. The IPv6 support for HTTP trackers is specified in BEP number seven [24]. As soon as the client starts announcing over HTTP, it sends HTTP requests to the tracker. An example of a general announce request to a tracker can be seen in Listing 3.1.

GET /announce?peer_id=x&info_hash=y &port=1234&left=0&downloaded=0&uploaded=0&compact=1 &ip=192.168.0.1 Listing 3.1: HTTP GET announce request IPv4 example.

13 14 CHAPTER 3. VIOLA EXTENSION

To support IPv6, two new keys &ipv4 and &ipv6 were introduced. They can be used to specify the corresponding IP address and can even be used at the same time if the client supports both protocols. This can be seen in Listing 3.2. However, depending on the tracker, the addresses provided in the announce request are not honoured and the actual address with which the request was made is used.

GET /announce?peer_id=x&info_hash=y &port=1234&left=0&downloaded=0&uploaded=0&compact=1 &ipv4=192.168.0.1 &ipv6=2001%3A%3A53Aa%3A64c%3A0%3A7f83%3Abc43%3Adec9 Listing 3.2: HTTP GET announce request IPv6 example.

If the client requests a non-compact response, no change is needed in the whole process, since the peer endpoints are returned as a string in their expanded form. However, nowadays compact responses are used almost exclusively. It is much smaller than the standard response and helps to reduce network traffic. The reason for the smaller response size is that the content of a compact response is bencoded [13]. Bencoding is a way to specify and organize data in a terse format (c.f. Section 2.1.4). A standard bencoded response contains all IP addresses in a key called peers. To support IPv6 and still remain backwards compatible, an additional key peers6 was specified containing only IPv6 address-port pairs. The bencoded response might look like in Listing 3.3. This example response contains both a peers and a peers6 value with 6, respectively 18 bytes for the IPv4 and IPv6 address and port. d8:intervali1800e5:peers6:iiiipp6:peers618:iiiiiiiiiiiiiiiippe Listing 3.3: Bencoded tracker response returning IPv6 peers.

If a tracker response only contained a peers6 key, Ttorrent was not able to process the response. Listing 3.4 shows the modified code in the class parsing the tracker response. In the original code, Ttorrent only checked for the peers key. When confronted with an IPv6-only tracker response, Ttorrent was always throwing the "Unknown HTTP tracker message!" exception. To support IPv6, a check for the peers6 value was added. Ad- ditionally, the parse() method had to be extended to support and correctly decode the IPv6 peers. if(params.containsKey("info_hash")) { return HTTPAnnounceRequestMessage.parse(data); } else if(params.containsKey("peers") || params.containsKey("peers6")) { return HTTPAnnounceResponseMessage.parse(data); } else if(params.containsKey("failure reason")){ return HTTPTrackerErrorMessage.parse(data); } throw new Exception("Unknown HTTP tracker message!"); Listing 3.4: Modified code checking keys in tracker response. 3.1. ADDING IPV6 SUPPORT TO THE TTORRENT CLIENT 15

3.1.2 UDP Tracker Support

In contrast to the HTTP tracker, UDP trackers are not officially supported. There is no enhancement proposal or standardization for IPv6 UDP tracker support. In fact, in the BEP describing the BT UDP protocol [51] it says:

”IPv6 is not supported at the moment. A simple way to support IPv6 would be to increase the size of all IP addresses to 128 bits when the request is done over IPv6. However, I think more experience with IPv6 and discussion is needed before including it.” [51]

Despite this remark, there are IPv6 trackers offering UDP support (e.g. [30]). As there is no official documentation neither from any BEP nor from the trackers supporting UDP over IPv6 themselves, UDP tracker support was implemented with the help of the proposed extension to the UDP tracker format from [18] and some trial and error. Table 3.1 shows the content of the UDP IPv6 response that has been worked out. The table shows how the UDP response has to be decoded by means of splitting the response at the specified byte offsets. This schema seems to be used by all IPv6 trackers supporting UDP that were found during this work.

Offset Size Name Value 0 32-bit int action 1 4 32-bit int transaction id 8 32-bit int interval 12 32-bit int leechers 16 32-bit int seeders 20+n*18 16-byte string IPv6 address 36+n*18 16-bit int TCP port

Table 3.1: Proposed UDP tracker format extension (adapted from [18]).

Table 3.1 only differs from the table from [18] in the action value. [18] proposed a value of 4 for action, however this is not used by any tracker tested during this work. They all use the same action value for IPv4 and IPv6, which is 1. As it was the case with the HTTP tracker, the parse() method had to be extended to support the longer IPv6 addresses. Listing 3.5 shows the part of the code responsible for parsing the address. The main difference to the code parsing the IPv4 address is the byte array holding the IP address which has to be 16 bytes wide for an IPv6 address instead of the 4 bytes for IPv4 addresses. 16 CHAPTER 3. VIOLA EXTENSION while(data.remaining() > 17) { byte[] ipBytes = new byte[16]; data.get(ipBytes); InetAddress ip = InetAddress.getByAddress(ipBytes); int port = (0xFF & (int)data.get()) << 8 | (0xFF & (int)data.get()); peers.add(new Peer(new InetSocketAddress(ip, port))); } Listing 3.5: Modified code parsing the IPv6 address in the UDP response.

After this modification of the IP address decoding, Ttorrent was able to successfully connect to IPv6 trackers over UDP.

3.2 VIOLA

After the IPv6 connection basics were working with the modified Ttorrent library, VI- OLA itself had to be extended. The extensions to VIOLA have been split into general extensions, slave extensions and master extensions. General extensions are modifications that concern both the slave and the master module, while the latter two are specific to the respective module.

3.2.1 General Extensions

The modifications concerning both modules mainly consisted of changes to the data type holding the peer IP addresses and the encoding of the messages sent between the master and the slave module.

Data Types

The biggest challenge while extending VIOLA was caused by the size difference between an IPv4 and an IPv6 address. The 32 bits of an IPv4 address fit perfectly into an integer data type in Java. Thus, in the original VIOLA code, all IP addresses are stored and passed around as integers, which makes it very efficient. However, an IPv6 address does not fit into a single Java integer variable. In order to solve this problem, a replacement data type for storing IP addresses had to be found.

Following possibilities were considered:

BigInteger BigInteger [11] is a Java class available since JDK1.1 and can be used to store arbitrary-precision integers. BigIntegers arithmetic operators mimic standard Java integer arithmetic exactly, which means that those operations could be performed without additional conversions. 3.2. VIOLA 17

Two long variables The long data type in Java uses 64 bits. Two longs could be used to store a single IPv6 address with 128 bits. One long variable for the first part of the address, the second long variable for the second part of the address. The long variables could be wrapped with a custom class to facilitate access and operations on the address.

InetAddress InetAddress [12] is a Java class available since JDK1.0. It represents an IP address and provides useful methods for working with IP addresses. In fact, In- etAddress objects are already used in VIOLA most of the time when communicating with external APIs or libraries such as Ttorrent. InetAddress already has built in support for handling IPv6 addresses.

Java byte array Using a byte array presents the simplest method to represent an IP address in Java. The byte array can store arbitrarily long values and as such is suitable for representing IPv6 addresses.

In the end, the method described last, byte array, was chosen for storing an IP address. This was done for multiple reasons. First, it leaves the smallest footprint, as a byte array has the smallest overhead of all the alternatives. This was a very important property, as the master already needs a lot of computing power under full load. Second, a byte array can easily be stored in an Apache Avro database without any explicit conversions [3]. And third, byte arrays can be converted into InetAddress objects with ease. As already mentioned, InetAddress objects are used in a lot of places in VIOLA. All classes in VIOLA were changed to use byte arrays instead of integers for storing IP addresses. Some methods had to be slightly changed due to the differences between byte arrays and integers.

Encoding

For the master and the slave to communicate successfully, data sent between the two parts have to be encoded and decoded correctly. With a byte array this is not as straightforward as with an integer. Thus, the AnnounceReplyMessage class had to be extended with an encoding and decoding ability for byte arrays. AnnounceReplyMessage is the message that gets sent between the master and the slave and stores all the gathered peers. For this, the Base64 [32] encoding scheme was chosen. Base64 encodes the byte array into an ASCII string that can be sent over the network as a standard Java string. Listing 3.6 and Listing 3.7 show the parts of AnnounceReplyMessage responsible for encoding, respectively decoding the IP address. The address is stored in a special Krak- enPeer object that is used throughout VIOLA. A KrakenPeer object now stores its IP address as a byte array which is encoded to a Base64 string in the AnnounceReplyMessage. 18 CHAPTER 3. VIOLA EXTENSION for(KrakenPeer peer : this.getPeers()){ peers .put( Base64.encodeBase64String(peer.getIp()), peer.getPort() ); } Listing 3.6: Base64 encoding in class AnnounceReplyMessage. for(Entry entry : peerIPs.entrySet()){ KrakenPeer peer = new KrakenPeer( Base64.decodeBase64( entry.getKey()), entry.getValue().intValue() ); this.getPeers().add(peer); } Listing 3.7: Base64 decoding in class AnnounceReplyMessage.

3.2.2 Slave Extensions

A big part of the functionality of the slave module is already covered by Ttorrent. But besides the calls to Ttorrent, the slave uses methods to communicate with the master (covered in Section 3.2.1). This section describes the remaining work to enable IPv6 tracker support and why DHT support was not implemented.

Tracker Support

Apart from the already mentioned conversion of all the integers used for IP addresses to byte arrays, the main modification of the slave consisted of integrating the modified Ttorrent library into the VIOLA project. While integrating the new library, additional test cases were written or existing cases were modified to cover the new parts of the code. After this integration, the client was already able to connect to IPv6 trackers, announce correctly and send the collected BitTorrent peers back to the master without further modification.

DHT Support

The original DHT specification for the BitTorrent network has been extended in 2009 to add support for IPv6 [6]. VIOLA uses an external library to add DHT support to the slaves called JKad [33]. JKad is a Java implementation of the Kademlia [40] protocol and provides full access to the BitTorrent Mainline DHT. As JKad is implemented in Java, it allows for easy use and modification in VIOLA. In its current implementation JKad does 3.2. VIOLA 19 only support the IPv4 DHT as JKad uses 4 byte wide arrays to store IP addresses. Due to time constraints it was decided against manually adding IPv6 support to JKad. To reach equality between the IPv4 and the IPv6 measurements, DHT support was generally deactivated for the measurement in this work.

3.2.3 Master Extensions

The master discovers and downloads new torrents, orchestrates its slaves, gathers all data from its slaves, gets location and Autonomous System (AS) information and stores everything continually in a database file. The main changes in the master module consisted of modifying the data storage to support the byte array IP addresses and to renew the databases used for geolocating the IPs. Additionally, a few modifications had to be made to facilitate torrent discovery.

Torrent Discovery

In VIOLA, there are two ways of adding torrents to the master. The most convenient method is to let VIOLA discover the torrents by itself. There is a built in general Rich Site Summary (RSS) reader that can be extended to support specific torrent sites. VIOLA currently supports the layout of feeds used by KickAssTorrents (KAT) (http://kat.am) and ThePirateBay (TPB) (https://thepiratebay.org). There was no need to add sup- port for more torrent websites as VIOLA already covered the most widely used websites. In order to facilitate adding IPv6 trackers to all the torrents, a simple way of adding arbi- trary tracker URLs to already downloaded torrents was implemented. Listing 3.8 shows how these trackers are appended to the end of the magnet link of the downloaded torrent. Internally, magnet links are used for every discovered torrent. The customTrackers list can be filled by adding the desired trackers to the VIOLA configuration file. This func- tionality had to be implemented because none of the torrents from either KAT or TPB included an IPv6 tracker in their metainfo file.

if(!customTrackers[0].isEmpty()) { String tempMagnetURI = item.getMagnetURI(); for (String ct : customTrackers) { tempMagnetURI = tempMagnetURI.concat("&tr="+ct); } item.setMagnetURI(tempMagnetURI); } Listing 3.8: Code adding custom trackers to downloaded torrent.

In hindsight, a better implementation of this functionality could have been achieved by adding the custom trackers only at the moment when the slave starts announcing to the trackers. With this current implementation, the already downloaded torrents get modified. Even though this does not change the info_hash of the torrent, it is a change in the file itself and would get lost if the same torrent would be redownloaded from a torrent cache service. 20 CHAPTER 3. VIOLA EXTENSION

Geolocation Databases

VIOLA relies on data from MaxMind to geolocate the gathered IP addresses. MaxMind provides a geolocation database GeoLite2 City [22] which can be freely downloaded. The original database included in VIOLA only contained IPv4 addresses. MaxMind also pro- vides the GeoLite2 City database with IPv6 and IPv4 addresses together in the same file. This newer database was downloaded and replaced the older GeoLite2 database. Because this database contained addresses of both protocols, no code modification was necessary in this case. MaxMind also provides the AS data from which VIOLA queries the AS Number (ASN). In contrast to the GeoLite2 City database, the ASN database [21] is a legacy database and comes in two parts, one containing IPv4 data and one containing IPv6 data. Those are also downloaded and stored in memory. As the ASN database is now split, an appro- priate method had to be implemented additionally to decide which database file to query depending on the type of IP address.

Data Storage

Peers are stored in an Avro database file. In the original VIOLA code, the peers are stored as integers, however as the peer is now a byte array, the corresponding Avro database schema had to modified and recompiled. In Avro, the data type corresponding to a Java byte array is the bytes type. This was changed accordingly. This way, the byte array from VIOLA can be written into the Avro file without any explicit conversions. Chapter 4

Measurement

After finishing the extensions to VIOLA, a one week measurement was conducted in order to gather IPv4 and IPv6 peers. This chapter describes the measurement setup and gives a detailed account of how the measurement was conducted. In the end, a description of the data extraction that was needed in order to prepare for the analysis of the gathered data is given.

4.1 Preparation

Before the measurement could be started, all parts involved in the measurement had to be prepared accordingly. The following section describes how the master and the slave servers were configured and set up and which torrents were chosen to start with the measurement.

4.1.1 Master Server

In order to start the measurement, the VIOLA modules had to be set up on the computers of the University of Zurich. The server hosting the master needed sufficient space to store all the gathered data. The master server only had IPv4 access to the Internet which was sufficient to download new torrents and communicate with the slaves. Finding new popular torrents for use during the measurement proved to be a challenge. VIOLA already includes a script that reads feeds from KickassTorrents and PirateBay and downloads the corresponding torrents (c.f. Section 3.2.3). However, KickassTor- rents was shut down in July [19] by the U.S. government and could not be used to gather new torrents. The website still existed but no new torrents were available from its torrent feed. This left PirateBay as the only source for all torrents. Because the planned one week duration of the measurement was relatively short, the PirateBay top lists (https://thepiratebay.org/top/205, https://thepiratebay.org/top/201 and https://thepiratebay.org/top/48h200) for movies and TV shows were used in order to gather the popular torrents instead of a feed of all the recently added torrents starting with zero peers. This ensured that big swarms would be measured right from the start.

21 22 CHAPTER 4. MEASUREMENT

A further problem was that none of the trackers provided by the torrent files from Pirate- Bay did support IPv6 peers. It was then decided to add two IPv6 trackers to every down- loaded torrent via the method described in a previous section (c.f. Section 3.2.3). The IPv6 trackers used were udp://explodie.org:6969 and http://tracker.nwps.ws:6969. There were various other trackers available that supported IPv6 but after some testing it was found that the two mentioned above were the only working IPv6 trackers where peers for PirateBay torrents could be found at the time of the measurement.

4.1.2 Slave Servers

In contrast to the master server, the slave servers all needed an IPv6 interface in order to connect to the IPv6 trackers. Thus, the servers were connected to the Internet with a dual stack connection allowing for both IPv4 and IPv6 access. Otherwise, there were no additional requirements to the server as the only thing the slaves would need to store are the log files created during the measurement. After making sure the master and the slave would connect with each other correctly, the slave server was cloned. In total, 10 slaves were created that all gathered BitTorrent peers simultaneously during the measurement.

4.2 Measurement Process

The measurement was started on October 7, 2016 at 15:10 and was stopped one week later on October 14, 2016 exactly at midnight. Figure 4.1 shows a timeline with the incidents that happened during this period. During the measurement, the service used to cache torrents (http://thetorrent.org) went offline on October 11, 2016 at 04:32. In the end, this did not impact the measurement as it is only used to store already gathered torrents as a backup if VIOLA would need a restart and the database storing the torrents would be lost. At half past seven in the morning of the last day, one of the two IPv6 trackers went offline (http://tracker.nwps.ws:6969) and never came back online. This had an impact on the overall number of measured IPv6 peers during the last day as only one working IPv6 tracker was left for the rest of the measurement.

Figure 4.1: Timeline of VIOLA measurement starting on October 7, 2016 and ending on October 14, 2016. 4.3. DATA EXTRACTION 23

After the measurement was concluded, all the data was stored on the master server in Avro files separated by day. To facilitate analysing and extracting the data, it was loaded into an Impala [28] database. In order to be able to import the data to Impala, the files first had to be prepared accordingly. VIOLA includes a tool to scan Avro files and prepare the partitioning by copying the data into several new files. The files were separated by the hour at which the specific peers were gathered. During this process, an additional database column peeruid was created. This column is just the concatenation of the peer IP address and the corresponding port number. It was created to facilitate later analysis of the data. Additionally the IP addresses stored as bytes were converted into strings to make the IP addresses easily readable. In the end, the files were separated by day and hour. The prepared Avro files were then automatically uploaded to the database server by the script. To get the prepared data in the Impala database, a new table announce_ipv6 was created on the database server. This table only differed from the older measurement tables by the change of the data type storing the peer IP addresses which was now a string instead of an integer. Impala was pointed to the location of the partitioned data and the data was imported via an SQL query.

4.3 Data Extraction

After the measurement came to an end, approximately 26 GB of data was gathered and stored in Impala. To make analysis and visualizations of this data possible, corresponding data tables had to be created. To visualize the data, the Google Charts API [23] was chosen. Google Charts provides an easy to use interface to create charts and renders them as an interactive, coloured image in the web browser. The data for the visualizations is stored in a MySQL database. A PHP script is responsible to query the MySQL database. The script finally parses the response to JSON, which is a format that can be used by Google Charts. Google Charts is able build and cache its own internal data table within the browser from where it can be dynamically changed. The following sections first describe how the amount of NAT usage in the measurement was approximated and then shows how the data tables for the NAT vs. IPv6 correlation and the global IPv6 distribution were prepared.

4.3.1 NAT Approximation

In order to obtain the amount of NAT usage in the measurement, the average number of ports per distinct IP was calculated over the whole week. This is an approximation of the real NAT value, because multiple ports for the same IP are not exclusively caused by CGN. For example, a BT client restart could also cause the client to connect to the network with a different port or a user could be running two BT clients simultaneously. This issue could be mitigated by only averaging the ports over the data from one single day which would in turn expose the data to daily fluctuations. Thus, it was chosen to average over the whole week as non-NATed as well as NATed clients are affected by these issues. 24 CHAPTER 4. MEASUREMENT

Guaranteed NAT detection would only be possible by cross-checking IP/Port pairs on both sides of the NAT [34]. As VIOLA does not initiate a connection with the gathered peers, such a detection is not possible and even then could not be guaranteed if the ports would not get forwarded. Average ports per IP is the best approximation that can be made with this data, as it is common behaviour for torrent clients to use the same port for every torrent in the same session [13]. If a single IP address has multiple ports, it is likely that there are multiple clients behind one NATed address. However, the real amount of NAT addresses cannot be guaranteed with this method and is very likely to be higher because many addresses with only one port might also be behind a NAT.

4.3.2 Scatterplot Data Preparation

With the IPv6 peers in the measurement and the approximated NAT usage numbers, the question if there is any correlation between those two values can be answered. To compare the two values, two different scatterplots were created. One grouped by AS and a second one grouped by country, in order to have a more general overview. For the scatterplot with AS data, the percentage of IPv6 BT users in the AS was approx- imated to have more comparable results. This was achieved by dividing the number of IPv6 addresses in the AS by the size of the IPv4 address space of the AS. The address space data was provided by http://ipinfo.io. The size of the IPv4 address space is a better indicator of the size of an AS than the size of the IPv6 address space because RIRs originally assigned IPv6 address blocks by a ’one size fits all’ principle, resulting in ASes that have huge unused address space not indicative of their actual size [45].

The pseudo SQL query in Listing 4.1 was used to extract the necessary data for the AS scatterplot from Impala. In the first part of the query, the AS numbers are grouped with the corresponding average port value and are then joined with the total IPv6 value for each AS in the second part. Because in the Impala table there was no flag to differentiate between IPv4 and IPv6 addresses, they had to be differentiated in the SQL query with regular expressions. This problem could be addressed in future refinements of the VIOLA IPv6 extension as this complicates every query.

SELECT asnumber, avg_ports, ipv6 AS ipv6Total FROM (SELECT asnumber, AVG(ports) AS avg_ports FROM (SELECT asnumber, ip, COUNT(DISTINCT port) AS ports FROM announce_ipv6 WHERE ip RLIKE ’^[0-9]*[.]’ GROUP BY asnumber, ip) GROUP BY asnumber) JOIN (SELECT asnumber, COUNT(DISTINCT ip) AS ipv6 FROM announce_ipv6 WHERE ip RLIKE ’^[a-f0-9]*:’ GROUP BY asnumber); Listing 4.1: Pseudo SQL query for scatterplot grouped by AS 4.3. DATA EXTRACTION 25

4.3.3 Choropleth Map Data Preparation

Because IP addresses are assigned to specific geographical regions, one interesting method to visualize this data is to use a choropleth map [5]. ”Choropleth Maps display divided geographical areas or regions that are coloured, shaded or patterned in relation to a data variable” [5]. Such a coloured map allows for a quick and easy to understand overview over the gathered data. To prepare the data for a choropleth visualization, a new table had to be created from the gathered data. The pseudo SQL query seen in Listing 4.2 was used to fill the table. The necessary numbers for the total IP addresses and the IPv6 percentage could then be calculated from the extracted data.

SELECT country, COUNT(DISTINCT ipv4), COUNT(DISTINCT ipv6) FROM (SELECT DISTINCT country, CASE WHEN peeruid RLIKE ’^[0-9]*[.]’ THEN peeruid END AS ipv4, CASE WHEN peeruid RLIKE ’^[a-f0-9]*:’ THEN peeruid END AS ipv6 FROM announce_ipv6) GROUP BY country; Listing 4.2: SQL query for choropleth map data. 26 CHAPTER 4. MEASUREMENT Chapter 5

Analysis Results

The data resulting from the measurement allowed for a comprehensive analysis of the status of IPv6 and NAT in BT. This chapter first gives an overview over the general measurement data, then the global IPv6 and NAT distributions are presented with the help of choropleth visualizations and the evolution of IPv6 in BT is illustrated. In the last two sections, the question of a correlation between NAT and IPv6 is answered and the limitations of the measurement data is discussed.

5.1 General Measurement Data

During the seven day measurement, 11’393’887 distinct IP/Port combinations (called peeruid in the measurement) were gathered in the BT network from 236 different coun- tries. Of those peeruids, with 10’861’752 addresses, the vast majority was made up by IPv4 peeruids while the number of gathered IPv6 peeruids amounted to only 532’135. This results in an IPv4 percentage of 95.3% and an IPv6 percentage of 4.7% in the mea- surement. Figure 5.1a shows the distribution of IPv4 and IPv6 addresses in a pie chart.

IPv4 Single-port IPs 88.5% 95.3% IPv6 Multi-port IPs 4.7% 11.5%

(a) Distribution of IP versions. (b) Amount of NATed IPv4 addresses.

Figure 5.1: Pie charts showing the IP and NAT distributions in the measurement.

Of all the gathered IPv4 addresses, 908’237 distinct IP addresses were found with more than one port. This characteristic is likely pointing to these addresses being behind a

27 28 CHAPTER 5. ANALYSIS RESULTS

NAT. Note that this is only an approximation of NAT and not a guaranteed value (c.f. Section 4.3.1). This amounts to a total of 11.5% of IPv4 addresses in the measurement that are likely to be NATed. Figure 5.1b shows this distribution in a pie chart.

5.2 Global IPv6 Distribution

The country with the highest absolute numbers of distinct IPv6 addresses captured in the measurements was the United States. This is not surprising considering the population size of the US. Surprisingly, a relatively small country like has the second most IPv6 peers in the measurement. Table 5.1 lists the top ten countries using BitTorrent via IPv6 by absolute numbers. As absolute numbers do not account for the population size of the countries, the IPv6 peers have been compared to the amount of IPv4 peers measured from the same country. Table 5.2 shows the top ten countries by percentage. Note that the 40% adoption rate result for Cuba is negligible as there have only been five addresses gathered from Cuba in total. The table shows that Belgium is in the lead and has the highest IPv6 adoption rate by a large margin. Then, there is a second level with France and Switzerland having an adoption rate of around 20%. Finally, starting at 10.51% with Luxembourg, the rates gradually decline. Out of the 236 countries represented in the data, for 96 countries there could no IPv6 addresses be captured.

Country Distinct IPv6 Peers Country IPv6 Percentage United States 116’230 (Cuba 40%) Belgium 89’423 Belgium 39.54% France 83’071 France 21.51% India 55’017 Switzerland 19.07% Great Britain 40’607 Luxembourg 10.51% Greece 27’524 Greece 10.19% Canada 16’210 8.99% Brazil 15’892 8.43% Portugal 12’474 United States 8.24% Switzerland 10’254 Estonia 6.33%

Table 5.1: Top ten IPv6 countries in the Table 5.2: Top ten IPv6 countries in BitTorrent network by absolute num- the BitTorrent network by percentage bers. (Cuba < 6 total addresses).

The choropleth map resulting from this data is shown in Figure 5.2a. Because most countries have an IPv6 adoption rate between 0% and 1%, the colour gradient was chosen to be at its darkest colour at an adoption rate of 10%. This way, more differences in countries with lower adoption rates can be seen. The map only shows countries where more than 100 peers have been measured in total. To be able to compare the IPv6 adoption rates, a map showing the relative distribution of the IPv4 peers in the measurement is shown in Figure 5.2b. To facilitate analysing specific countries, multiple versions of the map concentrating on different continents were created additionally and are available on [44]. 5.2. GLOBAL IPV6 DISTRIBUTION 29

0 1100 00.001.001 1100 (a) IPv6 adoption rates. (b) Relative IPv4 distribution.

Figure 5.2: Geochart renders of IP data from measurement. Interactive versions available at [44].

The IPv6 adoption map in Figure 5.2a shows that most countries with a relatively high percentage of IPv6 BT users are concentrated in Europe and North America with a few exceptions in South America and the Asia-Pacific region. First, one might think that this stems from the costs incurred by the IPv6 deployment and that many countries may simply lack the funds for it. However, [48] indicates, that the cost for IPv6 deployment might not be extremely high. ISPs are one of the driving forces in IPv6 adoption [9]. The IPv6 deployment costs that an ISP has to bear ranges heavily with the time it takes for full deployment [48]. Most of the costs are incurred by updating legacy equipment at the end user. If the deployment is done over multiple years, the cost to deploy IPv6 may even be quite small, as the end user equipment update could be incorporated into the regular replacement cycle. This speaks for the fact that early IPv6 adoption would be beneficial for an ISP. However, as can be seen from the choropleth map, ISPs in many countries are hesitating to change to the newer protocol. A big reason for this is that most benefits of IPv6 adoption ”will arrive in the future and are still uncertain” [48] today. Many ISPs seem to simply wait and see, because transitional technologies such as NAT allowed them to operate even under the situation of an exhausted IPv4 address space. Additionally, cost of adoption will only get lower with time, because general knowledge of the new technology will get more and more widespread. This could explain why adoption rates vary wildly between countries even in the leading continents, e.g. Spain with a measured adoption rate of 0.05% right between Portugal and France, both countries that are represented in the top ten list (c.f. Table 5.2). If the IPv6 adoption rates are compared with the relative IPv4 distribution in Figure 5.2b, it can be observed that the four countries with the highest numbers of IPv4 peers in the measurement (United States, India, Great Britain and Brazil) also show a relatively high rate of IPv6 adoption.

The IPv6 distributions obtained from the measurement correlate with results from general IPv6 measurements, such as the statistics from Google [29] that are shown in Figure 5.3. However, the adoption rates obtained from the BT measurement are lower than the ones from Google for most countries. This is due to the fact that IPv6 in BT is not yet as readily available for the end user. As could be seen from the measurement process, IPv6 trackers had to be manually added (c.f. Section 4.1.1) while the Google website is reachable over native IPv6 without any extra work. 30 CHAPTER 5. ANALYSIS RESULTS

Figure 5.3: Google’s general per-country IPv6 adoption statistic [29]. Red and orange countries signify reliability or latency issues with IPv6.

5.2.1 Evolution of IPv6

By comparing the IPv6 distribution map in Figure 5.2a with the measurements from E. Vyncke (c.f. Section 2.2.2), a few observations about the evolution of IPv6 in the BT network can be made. Vyncke measured the distribution of IPv6 peers in the BT network periodically starting in 2009. For this comparison, Vyncke’s data from December 28, 2012 was chosen, as the next year, 2013, was the year when global IPv6 deployment just started to pick up according to Google’s data [29] and thus promised to show the most interesting differences. Figure 5.4a shows the GeoChart map of Vyncke’s measurement, while Figure 5.4b shows the map of the VIOLA measurement with the colour gradient matched to a maximum percentage of 1.93% in order to be able to compare the maps with each other.

0 1,931,93 0 11.93.93 (a) Vyncke data, December 28, 2012 [54]. (b) Own data, October 14, 2016.

Figure 5.4: Comparison of IPv6 distribution in BT from 2012 to 2016.

If the two maps are compared, it can be observed that Romania and France were leading IPv6 adoption in the BT network in 2012. While Romania’s adoption rate is only slightly higher in 2016, France is still one of the leading countries in IPv6 adoption. However, Belgium has caught up to France’s efforts from a 0% adoption rate in 2012 and surpassed it by a large amount. Apart from being a small and dense country, an important reason 5.3. GLOBAL NAT DISTRIBUTION 31 for Belgium’s lead role in IPv6 adoption seems to be that there is a ”memorandum of understanding between ISP[s] [...] to limit the sharing of 1 IPv4 address to a maximum of 16 subscribers” [42]. Obviously, this restricts the usefulness of CGN and made finding an alternative solution all the more important. In general, the amount of countries having IPv6 peers in the BT network is much larger in 2016 than at the end of 2012. This observation correlates with the globally rising IPv6 adoption rates and is expected to continue in the future.

5.3 Global NAT Distribution

When analysing the global NAT distribution in the measurement, Vatican City had the highest average ports per IP by a large margin. In total, of the 236 measured countries there were only around 30 countries with an average of 2 ports per IP or higher. Many of those countries were African countries or small island states. Notable exceptions were South Korea (4.39 Ports/IP) and the Netherlands (2.50 Ports/IP). Table 5.3 shows the top five countries using NATed addresses. Country Ports/IP Vatican City 16.84 Sierra Leone 6.43 Botswana 5.35 Benin 4.99 South Korea 4.39

Table 5.3: Top five countries using NATed addresses.

The choropleth map resulting from the NAT data is shown in Figure 5.5. Again, the colour gradient had to be adjusted because most values for the average port number were between one and two. Thus, an upper limit of five for the colour gradient was chosen.

11.021.021 5

Figure 5.5: GeoChart render of the global NAT distribution. Interactive version available at [44]. 32 CHAPTER 5. ANALYSIS RESULTS

The NAT distribution map in Figure 5.5 shows that high NAT usage has been mainly measured in African countries. This might seem unusual at first, as AFRINIC is the only RIR that still has free IPv4 address blocks to distribute (c.f. Section 2.1.1). However, NAT usage was rampant throughout the African continent since the early days of the Internet [41]. This habit seems to have continued over the years. Many times, Internet access in African countries is shared between a lot of people and NAT was a convenient solution for local service providers [1]. AFRINIC officially discourages the use of NAT [31] and encourages the reservation of more address space as they still have free resources, but many local service providers might not be aware of the fact or simply not willing to change. The relatively high NAT usage observed in some Asian countries (especially South Korea) might be due to the fact that these countries have large populations and a relatively small IPv6 adoption rate.

5.4 Correlation of NAT and IPv6

The scatterplot to analyse if there was a correlation between NAT and IPv6 usage was created from two different data sets. One of the plotted data sets compares IPv6 and NAT from this measurement. However, this only compares data measured and averaged within the same time period. It makes sense that an ISP already deploying IPv6 addresses does not have an immediate need for NAT anymore because both technologies are a solution for same problem. Thus, a second data set was created with IPv4 data from an older VIOLA measurement. The IPv4 data from the older measurement covered one week from May 1, 2016 to May 8, 2016. The same average port calculation as already mentioned in this section was conducted on the data and then visualised together with the IPv6 data from the new measurement.

200

AS in old measurement AS in new measurement

150

100 Average ports per IP Average

50

0 0.000 0.005 0.010 0.015 0.020

% of IPv6 BT users

Figure 5.6: Scatterplot for ASes. Interactive version available at [44]. 5.4. CORRELATION OF NAT AND IPV6 33

Figure 5.6 shows the scatterplot with both data sets plotted. On the x-axis is the per- centage of IPv6 BT users in the AS. On the y-axis is the average number of ports per IP address in the specific AS. The red dots represent ASes from the new measurement, while the blue dots represent ASes from the old measurement. The scatterplot has been zoomed to show more detail. The plot shows that while there are a few ASes with both relatively high IPv6 usage and NAT deployment, the data exhibits a distinct L-shape. This shape signifies that there is no correlation between the deployment of NAT and IPv6 addresses. On the contrary, it seems that most ASes using IPv6 do not use NAT and vice versa. The most noticeable difference between the two data sets is that the data from the older measurement shows remarkably higher average port numbers. This is caused by the fact that it included a lot more torrents and peers, partly because DHT was activated during the measurement.

A second, more general plot was created to concentrate on countries instead of ASes. It can be seen in Figure 5.7 and follows the same principles as the first plot. However, this scatterplot has not been zoomed and shows every country where IPv6 peers have been measured.

120

Country in old measurement Country in new measurement

90

60 Average ports per IP Average

30

0 0.0 0.1 0.2 0.3 0.4

% of IPv6 addresses

Figure 5.7: Scatterplot for countries. Interactive version available at [44].

Figure 5.7 shows that there is also no correlation between NAT and IPv6 on the level of countries. Again, an L-shape is exhibited, most noticeable by the data from the older measurement. The shape of the data from the new measurement is much less pronounced which is caused by the overall lower average port numbers. The countries with those extremely high average port numbers are exclusively African countries, of which the cause has been discussed in the previous section (c.f. Section 5.3).

The results from this measurement have clearly shown that there is no correlation between the NAT usage and the IPv6 deployment in the BT network. The scatterplots even seem to point to an ”either-or” relationship between those two values. The data also shows that DHT usage in a measurement has a high impact on the average number of ports that can be observed per IP. 34 CHAPTER 5. ANALYSIS RESULTS 5.5 Data Limitations

The data gathered during this measurement study and used for the visualizations has certain limitations that have to be considered. Although the scatterplots show that there is not correlation between NAT and IPv6 deployment, this can only be said for certain for the period between the old and the new VIOLA measurement. It might be the case that this measurement period was too short to observe ISPs replacement of NAT with IPv6. To answer this question definitively, it seems a long term measurement is needed. Then the NAT data from a measurement that is a few years old could be compared with the current IPv6 data and a more precise conclusion could be drawn. A natural next step would be to repeat these measurements in a few years and to compare the data again. Additionally, the data about NAT could be distorted because of the fact that there is no guaranteed NAT measurement (c.f. Section 4.3.1). For future work, a method could be devised to get clearer data of addresses behind NAT. This would mean to change the behaviour of VIOLA, as a direct connection to peers would be needed in order to gather guaranteed NAT data. Chapter 6

Conclusion & Future Work

During this thesis, a one week measurement of the IPv6 peers in the BT network was conducted. VIOLA, a BT measurement system, was extended in order to support con- nections to IPv6 BT trackers. In doing that, Ttorrent, a Java based BT library and a basis for VIOLA, was modified and an unofficial UDP IPv6 tracker protocol was tested and documented about which hardly any resources could be found. The thesis gave an overview over the status of IPv6 adoption in the BT network and answered the question, if there is a correlation between NAT usage and IPv6 adoption. While analysing the mea- sured data it became clear that it would not show any correlation between NAT usage and IPv6 adoption rates, on the contrary, an ”either-or” relationship between those two values was observed. It is suspected, that measurements taken multiple years apart could give a more certain answer to this question. Nonetheless, the new IPv6 measurement delivers valuable data about IPv6 adoption rates in the BT network. Approximately 5% of the measured IP addresses were from IPv6 peers. The leading countries with IPv6 peers in the measurement were mostly concentrated in Europe and North America with Belgium leading the global IPv6 adoption by a large margin. While African countries did not have high IPv6 adoption rates, they exhibited the highest NAT usage. The IPv6 adoption rates in BT are consistent with general IPv6 measurements and are on the rise globally.

6.1 Future Work

DHT for IPv6 is a feature that could have a big impact on the measured numbers and was not activated during this measurement due to lack of support in VIOLA. If DHT support in VIOLA could be extended for IPv6 most probably a lot more peers, IPv4 and IPv6, could be gathered. It would be interesting to see how this would change the dynamics of the measurement data. The implementation of the feature to add a custom tracker, which was needed to add IPv6 trackers to torrents, could use a less disrupting approach. Instead of implementing it directly at the master while saving a cached torrent, it could be implemented in the slave at the time of announcing.

35 36 CHAPTER 6. CONCLUSION & FUTURE WORK

Additionally, IPv4 and IPv6 are saved in the same column without an easy way to dif- ferentiate between the two protocols. To still differentiate them, regular expressions were used in the SQL queries. This unnecessarily complicates every query that has to be made while analysing the data. An easier way of differentiating between the two types could be implemented, such as a flag or boolean value or they could even be split altogether.

In order to draw a more precise conclusion about the correlation of NAT and IPv6, time should be a bigger dimension in the measurement. If IPv6 deployment data could be compared to NAT deployment data that lays multiple years back, changes in the network would be captured better. A next step would be to repeat these measurements every year and compare the data again. The numbers used for NAT deployment might be flawed values. They only approximate the real value roughly and can not guarantee NAT deployment. To improve the quality of the measurement, a method to have more certainty of NATed addresses would have to be implemented. One such approach could be by actively advertising to peers in the swarm. If the peer is not responding, it is likely to be behind a NAT or a firewall [35]. Although this does not guarantee NATed addresses either, the approximation would also include NATed peers that are represented with only one port in the data. Bibliography

[1] A. Akplogan. IP Addresses & development: Situation in Africa [Powerpoint Pre- ” sentation]“. OECD/World Bank workshop. Paris, Nov. 10, 2009. url: https:// www.oecd.org/ict/4d/43759820.pdf (visited on Nov. 27, 2016). [2] N. Ao and C. Chen. Understanding IPv6 user performance on private BT system“. ” In: 4th IET International Conference on Wireless, Mobile Multimedia Networks (ICWMMN 2011). Nov. 2011, pp. 288–293. [3] Apache Avro 1.8.1 Specification. May 22, 2016. url: http://avro.apache.org/ docs/1.8.1/spec.html (visited on Dec. 3, 2016). [4] BitTorrent. url: http://www.bittorrent.com (visited on Dec. 3, 2016). [5] Choropleth Map. url: http://www.datavizcatalogue.com/methods/choropleth. html (visited on Nov. 24, 2016). [6] J. Chroboczek. BEP32: BitTorrent DHT Extensions for IPv6. Oct. 14, 2009. url: http://www.bittorrent.org/beps/bep_0032.html (visited on Nov. 19, 2016). [7] C. ¸Ciflikli, A. Gezer, A. Tuncay Oz¸sahin,¨ and O.¨ Ozkasap.¨ BitTorrent packet traffic ” features over IPv6 and IPv4“. In: Simulation Modelling Practice and Theory 18.9 (Oct. 2010), pp. 1214–1224. [8] Cisco Inc. IPv6 Addressing White Paper. 2008. url: http://www.cisco.com/c/ dam/en_us/solutions/industries/docs/gov/IPv6_WP.pdf (visited on Nov. 2, 2016). [9] Cisco Inc. IPv6 for the Enterprise in 2015. Oct. 8, 2015. url: http://www.cisco. com/c/en/us/products/collateral/ios-nx-os-software/enterprise-ipv6- solution/whitepaper_c11-586154.html (visited on Nov. 26, 2016). [10] Cisco Inc. White paper: Cisco VNI Forecast and Methodology, 2015-2020. June 1, 2016. url: http://www.cisco.com/c/en/us/solutions/collateral/service- provider / visual - networking - index - vni / complete - white - paper - c11 - 481360.html (visited on Sept. 26, 2016). [11] Class BigInteger. url: https://docs.oracle.com/javase/7/docs/api/java/ math/BigInteger.html (visited on Nov. 20, 2016). [12] Class InetAddress. url: http://docs.oracle.com/javase/7/docs/api/java/ net/InetAddress.html (visited on Nov. 20, 2016). [13] B. Cohen. BEP03: The BitTorrent Protocol Specification. Jan. 10, 2008. url: http: //www.bittorrent.org/beps/bep_0003.html (visited on Oct. 16, 2016). [14] S. E. Deering and R. M. Hinden. Internet Protocol, Version 6 (IPv6) Specification. RFC 2460. RFC Editor, Dec. 1998. url: http://www.rfc- editor.org/rfc/ rfc2460.txt (visited on Nov. 29, 2016).

37 38 BIBLIOGRAPHY

[15] M. Defeche and E. Vyncke. Measuring IPv6 Traffic in BitTorrent Networks. Internet- Draft. IETF Secretariat, Oct. 2009. url: https://tools.ietf.org/html/draft- defeche-ipv6-traffic-in-p2p-networks-00 (visited on Nov. 29, 2016). [16] M. Defeche and E. Vyncke. Measuring IPv6 Traffic in BitTorrent Networks. Internet- Draft. IETF Secretariat, Mar. 2012. url: https://tools.ietf.org/html/draft- vyncke-ipv6-traffic-in-p2p-networks-01 (visited on Nov. 29, 2016). [17] K. B. Egevang and P. Francis. The IP Network Address Translator (NAT). RFC 1631. RFC Editor, May 1994. url: http://www.rfc-editor.org/rfc/rfc1631. txt (visited on Nov. 29, 2016). [18] erdgeist. The IPv6 Situation. Dec. 28, 2007. url: http://opentracker.blog.h3q. com/2007/12/28/the-ipv6-situation/ (visited on Oct. 12, 2016). [19] Ernesto. Feds Seize KickAssTorrents Domains, Arrest Alleged Owner. July 20, 2016. url: https : / / . com / feds - seize - kickasstorrents - domains - charge-owner-160720/ (visited on Nov. 19, 2016). [20] V. Fuller and T. Li. Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan. BCP 122. RFC Editor, Aug. 2006. url: http: //www.rfc-editor.org/rfc/rfc4632.txt (visited on Nov. 29, 2016). [21] GeoLite Legacy Downloadable Databases. url: https://dev.maxmind.com/geoip/ legacy/geolite/ (visited on Nov. 20, 2016). [22] GeoLite2 Free Downloadable Databases. url: http://dev.maxmind.com/geoip/ geoip2/geolite2/ (visited on Nov. 20, 2016). [23] Google Charts. url: https://developers.google.com/chart/ (visited on Dec. 1, 2016). [24] G. Hazel and A. Norberg. BEP07: IPv6 Tracker Extension. Jan. 31, 2008. url: http://www.bittorrent.org/beps/bep_0007.html (visited on Oct. 10, 2016). [25] G. Hazel and A. Norberg. BEP09: Extension for Peers to Send Metadata Files. Jan. 31, 2008. url: http://www.bittorrent.org/beps/bep_0009.html (visited on Nov. 23, 2016). [26] G. Huston. BGP Analysis Reports. url: http://bgp.potaroo.net/index-bgp. html (visited on Nov. 2, 2016). [27] G. Huston. IPv4 Address Report. Nov. 2, 2016. url: http://www.potaroo.net/ tools/ipv4/index.html (visited on Nov. 2, 2016). [28] Impala. url: http://impala.apache.org/ (visited on Nov. 23, 2016). [29] IPv6 Statistics. Nov. 8, 2016. url: https://www.google.com/intl/en/ipv6/ statistics.html (visited on Nov. 8, 2016). [30] IPv6 Tracker - An IPv6 open tracker project. url: https://www.ipv6tracker.org (visited on Nov. 20, 2016). [31] ISP/LIR Guidelines. url: https : / / afrinic . net / fr / community / policy - development / policy - support - guides / 783 - isplir - guidelines (visited on Dec. 3, 2016). [32] S. Josefsson. The Base16, Base32, and Base64 Data Encodings. RFC 4648. RFC Editor, Oct. 2006. url: http://www.rfc-editor.org/rfc/rfc4648.txt (visited on Nov. 29, 2016). [33]K.J unemann¨ and P. Andelfinger. JKad - a Java implementation of the Kademlia protocol, designed for academic research. url: https://dsn.tm.kit.edu/english/ misc_3148.php (visited on Nov. 19, 2016). BIBLIOGRAPHY 39

[34] T. Kivinen, B. Swander, A. Huttunen, and V. Volpe. Negotiation of NAT-Traversal in the IKE. RFC 3947. RFC Editor, Jan. 2005. url: http://www.rfc-editor. org/rfc/rfc3947.txt (visited on Nov. 29, 2016). [35] M. Kryczka, R. Cuevas, C. Guerrero, A. Azcorra, and A. Cuevas. Measuring the ” BitTorrent Ecosystem: Techniques, Tips and Tricks“. In: IEEE Communications Magazine 49.9 (Sept. 2011), pp. 144–152. [36] A. Lareida, S. Schrepfer, T. Bocek, and B. Stiller. Overlay network measurements ” with distribution evolution and geographical visualization“. In: NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium. Apr. 2016, pp. 222– 230. [37] Z. Lin, W. Hao, and Z. Shibing. A measurement study on BitTorrent traffic be- ” haviors over IPv6“. In: 2012 IEEE International Conference on Computer Science and Automation Engineering (CSAE). Vol. 3. IEEE, 2012, pp. 354–357. [38] A. Loewenstern and A. Norberg. BEP05: DHT Protocol. Jan. 31, 2008. url: http: //www.bittorrent.org/beps/bep_0005.html (visited on Oct. 17, 2016). [39] C. Mason. The long road to IPv6. Sept. 22, 2013. url: http://netnix.org/2013/ 09/22/the-long-road-to-ipv6/ (visited on Nov. 17, 2016). [40] P. Maymounkov and D. Mazi`eres. Kademlia: A Peer-to-Peer Information Sys- ” tem Based on the XOR Metric“. In: Peer-to-Peer Systems. Ed. by P. Druschel, F. Kaashoek, and A. Rowstron. Lecture Notes in Computer Science 2429. Springer Berlin Heidelberg, Mar. 7, 2002, pp. 53–65. [41] V. Mbarika, M. Jensen, and P. Meso. Cyberspace Across sub-Saharan Africa“. In: ” Commun. ACM 45.12 (Dec. 2002), pp. 17–21. [42] P. McNamara. Why Belgium leads the world in IPv6 adoption. July 27, 2016. url: http://www.networkworld.com/article/3100968/internet/why- belgium- leads-the-world-in-ipv6-adoption.html (visited on Nov. 27, 2016). [43] Measurements. Nov. 9, 2016. url: http://www.worldipv6launch.org/measurements/ (visited on Nov. 20, 2016). [44] S. Muller.¨ BitTorrent IPv6 Distribution. Nov. 29, 2016. url: http://www.simondo. ch/ipv6 (visited on Nov. 29, 2016). [45] T. Narten, G. Huston, and L. Roberts. IPv6 Address Assignment to End Sites. BCP 157. RFC Editor, Mar. 2011. url: https://www.rfc-editor.org/rfc/rfc6177. txt (visited on Nov. 24, 2016). [46] Network Address Translation. url: https://cisco-lessons.wikispaces.com/ Network+Address+Translation (visited on Nov. 17, 2016). [47] Number Resources. url: https://www.iana.org/numbers (visited on Nov. 17, 2016). [48] OECD. The Economics of Transition to Internet Protocol version 6 (IPv6). OECD Digital Economy Papers 244. Nov. 6, 2014. [49] M. Petazzoni. Ttorrent, BitTorrent library in Java. 2013. url: http://mpetazzoni. github.io/ttorrent/ (visited on Oct. 10, 2016). [50] J. Postel. Internet Protocol. STD 5. RFC Editor, Sept. 1981. url: http://www.rfc- editor.org/rfc/rfc791.txt (visited on Dec. 3, 2016). [51] O. van der Spek. BEP15: UDP Tracker Protocol for BitTorrent. Feb. 13, 2008. url: http://www.bittorrent.org/beps/bep_0015.html (visited on Oct. 12, 2016). [52] The 8472. BEP11: (PEX). Oct. 29, 2015. url: http://bittorrent. org/beps/bep_0011.html (visited on Oct. 16, 2016). 40 BIBLIOGRAPHY

[53] The 8472. BEP14: Local Service Discovery. Oct. 29, 2015. url: http : / / www . bittorrent.org/beps/bep_0014.html (visited on Oct. 30, 2016). [54] E. Vyncke. IPv6-enabled BitTorrent Peers. url: https : / / www . vyncke . org / ipv6status/p2p.php (visited on Nov. 14, 2016). [55] J. Weil, V. Kuarsingh, C. Donley, C. Liljenstolpe, and M. Azinger. IANA-Reserved IPv4 Prefix for Shared Address Space. BCP 153. RFC Editor, Apr. 2012. url: http: //www.rfc-editor.org/rfc/rfc6598.txt (visited on Nov. 29, 2016). Abbreviations

API Application Programming Interface AS Autonomous System ASN Autonomous System Number

BEP Bittorrent Enhancement Proposal BT BitTorrent

CGN Carrier-Grade NAT

DHT

HTTP HyperText Transfer Protocol

IP Internet Protocol IPv4 Internet Protocol Version 4 IPv6 Internet Protocol Version 6 ISP Internet Service Provider

KAT KickAssTorrents

NAT Network Address Translation

P2P Peer-To-Peer PEX Peer Exchange Protocol

RSS Rich Site Summary / Really Simple Syndication

TBP ThePirateBay TCP Transmission Control Protocol

41 42 ABBREVIATONS

UDP User Datagram Protocol URL Uniform Resource Locator uTP uTorrent Transport Protocol

VIOLA VIdeo consumption in OverLAy networks List of Figures

2.1 IPv4 Address Run-Down Model illustrating the global exhaustion of the IPv4 address space [27]...... 4

2.2 Measurement of users opening Google via native IPv6 connections [29]. . .5

2.3 Illustration of static NAT (adapted from [46])...... 6

2.4 Illustration of dynamic NAT (adapted from [46])...... 7

2.5 Illustration of NAT overloading (adapted from [46])...... 7

2.6 Simplified illustration of a CGN setup (adapted from [39])...... 7

2.7 Simplified overview of VIOLA (adapted from [36])...... 11

4.1 Timeline of VIOLA measurement starting on October 7, 2016 and ending on October 14, 2016...... 22

5.1 Pie charts showing the IP and NAT distributions in the measurement. . . . 27

5.2 Geochart renders of IP data from measurement. Interactive versions avail- ableat[44]...... 29

5.3 Google’s general per-country IPv6 adoption statistic [29]. Red and orange countries signify reliability or latency issues with IPv6...... 30

5.4 Comparison of IPv6 distribution in BT from 2012 to 2016...... 30

5.5 GeoChart render of the global NAT distribution. Interactive version avail- ableat[44]...... 31

5.6 Scatterplot for ASes. Interactive version available at [44]...... 32

5.7 Scatterplot for countries. Interactive version available at [44]...... 33

43 44 LIST OF FIGURES List of Tables

3.1 Proposed UDP tracker format extension (adapted from [18])...... 15

5.1 Top ten IPv6 countries in the BitTorrent network by absolute numbers. . . 28

5.2 Top ten IPv6 countries in the BitTorrent network by percentage (Cuba < 6 total addresses)...... 28

5.3 Top five countries using NATed addresses...... 31

45 46 LIST OF TABLES Appendix A

VIOLA Configuration Example

In order for the added custom tracker feature to work in VIOLA, a new configuration option custom.trackers was added to VIOLA. Trackers have to be assigned to this op- tion with their exact announce address. If multiple trackers are specified, they need to be separated by a comma. Listing A.1 shows an example configuration of the krakenmas- ter.properties file with the custom.trackers option.

#Kraken Properties File viola.bind.address=192.168.1.1 viola.server.port=1234 storage.run.dir=/home/example/run storage.archive.dir=/home/example/archive storage.geoip.dir=/home/example/GeoIP #Kraken RSS properties collector.interval=1200000 collector.feeds=pirate tv,http://thepiratebay.org/top/205 #Kraken tracker properties custom.trackers=udp://explodie.org:6969,http://tracker.nwps.ws:6969/announce Listing A.1: Example configuration of krakenmaster.properties file.

47 48 APPENDIX A. VIOLA CONFIGURATION EXAMPLE Appendix B

Contents of the CD

The enclosed CD contains the following data.

/ Zusfsg.txt...... Abstract in German Abstract.txt...... Abstract in English thesis.pdf...... The bachelor thesis in PDF format sources visualization_data.zip ...... Data sources for visualizations website.zip ...... Source code for statistics website [44]

49