<<

BitTorrent file using -like hidden services

R.J. Ruigrok Delft University of Technology

BitTorrent using Tor-like hidden services

by

Rob Ruigrok

in partial fulfillment of the requirements for the degree of

Master of Science in Computer Science

at the Delft University of Technology, to be defended publicly on August 31, 2015 at 14:00 PM.

Student number: 1371746 Project duration: June 11, 2014 – August 31, 2015 Supervisor: Dr. ir. J. A. Pouwelse Thesis committee: Prof. dr. ir. H. J. Sips, TU Delft Dr. ir. J. A. Pouwelse, TU Delft Dr. ir. S. E. Verwer, TU Delft

An electronic version of this thesis is available at http://repository.tudelft.nl/.

Preface

I would like to thank Johan Pouwelse for his enthusiasm, comments and sugges- tions. Furthermore, I would like to thank Egbert Bouman, Niels Zeilemaker, Elric Milon and Lipu Fei. They were always available for recommendations, and assisted me with the programming work in . Last but not least, I want to thank my friends, colleagues, family and wife for their support.

Rob Ruigrok Delft, The Netherlands August 23, 2015

iii

Abstract

The is a large public network of networks and computers. When no coun- termeasures are taken, all information and activities taking place on the public Internet are subject to traffic analysis, threatening personal freedom and privacy. The first measure to take is securing all transferred information by applying encryp- tion on it, which makes it at least very hard to tap the contents of the transferred information. But encryption does not add any value when it comes to . Although the contents of the message are not known (because it is encrypted), it is still possible to track down where a message comes from, and where it is going to by analyzing the network infrastructure. Governments or Internet providers may apply censorship or block any kind of network traffic coming or going from some- where. This is where hidden services come in. The idea of hidden services was described by the authors of Tor (The onion router). The hidden services protocol hides the location of both the sender and receiver by transferring and encrypting messages over multiple hops, without revealing the true identity or contents of the messages in transit.

This thesis proposes a design for implementing Tor-like hidden services in a de- centralized peer-to-peer system, enabling the possibility of downloading and seed- ing anonymously. A proof-of-concept will be implemented into Tribler, a BitTorrent developed at Delft University of Technology. Tribler already supports anony- mous downloading using encryption over circuits with multiple hops, but to make hidden services work, an end-to-end encrypted circuit between both the seeder and downloader needs to be established. This is not as easy as it seems, because the seeder and downloader do not know each other. The implementation details of hidden services are part of this work, as well as experiments on the performance of the system. Furthermore, a number of issues related to transforming the original idea of hidden services into a fully distributed context are solved.

v

Contents

Preface iii Abstract v 1 Introduction 1 1.1 Motivation and contribution ...... 1 1.2 Document structure ...... 3 2 Related work 5 2.1 Tor project ...... 5 2.2 BitTorrent ...... 6 2.3 Tribler ...... 7 2.4 Anonymity in Tribler ...... 8 2.5 Overview of research on communication ...... 10 3 Problem Description 17 3.1 Requirements on finding peers...... 17 3.2 Boosting performance ...... 19 3.3 Resilience to attacks ...... 19 3.4 Acquiring required keys for encryption ...... 21 3.5 Avoid exit nodes ...... 21 3.6 Connectability Problem ...... 22 3.7 Handling churn ...... 23 3.8 Banning corrupt peers...... 23 4 Design and Implementation 25 4.1 Integration into Tribler ...... 26 4.2 Decentralized discovery of hidden seeders ...... 27 4.3 Circuit setup ...... 27 4.4 Dispersy message cells ...... 27 4.4.1 Setting up Introduction Points ...... 28 4.4.2 Finding peers to download from ...... 28 4.4.3 Getting keys ...... 29 4.4.4 Create end-2-end ...... 30 4.4.5 Requesting end-to-end connection ...... 30 4.4.6 Linking end-to-end connection...... 31 4.5 Generic overview of message interaction ...... 32 4.6 Circuit reuse ...... 33 4.7 Protocol-specific extensions...... 34

vii viii Contents

5 Experiments and performance evaluation 39 5.1 Environment specification and assumptions ...... 39 5.2 Experiments on latency, speed and performance ...... 40 5.3 Experiment on fault resilience ...... 46 5.4 Experiment on multi-swarm collaboration ...... 50 5.5 Concluding remarks on experiments ...... 52 5.6 Yappi profiling of the system ...... 53 6 Future work 55 7 Conclusion 57 Appendices 59 A Gumby experiments 61 A.1 hiddenservices-1-gigabyte-1-hop.scenario ...... 61 A.2 hiddenservices-1-hop-multiple-seeders.scenario ...... 61 A.3 hiddenservices-1-hop-seeder.scenario ...... 62 A.4 hiddenservices-2-hop-seeder.scenario ...... 62 A.5 hiddenservices-3-hop-seeder.scenario ...... 62 Glossary 63 Bibliography 65 1 Introduction

If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place. Eric Schmidt

Do people really care about their privacy? When looking at messages shared on platforms it seems that most people do not care, and if asked some respond with having nothing to hide. The elders of the Internet are actually warning us for the threat of losing privacy. At the time of writing, a new cold war is emerging. [6] The conflict in Ukraine dominates headlines and the NATO and European Union take countermeasures to withstand upcoming attacks from Russia. But the renewed tensions between east and west differ greatly from tensions in the past. Since the end of the first cold war, the use of Internet became mainstream for both parties, news headlines can spread rapidly. But spying on the Internet advances too, all traffic on the Internet can be intercepted by government agencies, like the US’s Prism [20] and Russia’s Sorm [35] programs. Sharing controversial or political files over the Internet becomes dangerous, as there is a high chance that government officials in your country or abroad are watching over your back, knowing more than you know about yourself. Lots of tools pretend a false sense of security, and are essentially futile. This work contributes the goal of making file-sharing a Putin-proof experience, with nobody knowing who you are and what you’re doing.

1.1. Motivation and contribution The goal of this work is to enhance the use of BitTorrent with security, privacy and anonymity, by mapping the protocol used for Tor hidden services [12] on a peer- to-peer network. The approach differs greatly from its Tor counterpart, as it is fully based on self-organization (without any central servers) in academically-pure P2P

1 2 1. Introduction

style. The one way anonymous downloading and streaming functionality was made 1 available to public in 2014, based on prior work described in [28] and [33].

Figure 1.1: Logo of the Tribler

The next logical step is to go beyond anonymous downloading, by default pro- tection of seeders, and without leaking any content in transit. This is established by creating end-to-end encrypted circuits between seeders and downloaders. These circuits should preserve anonymity, but also act fast and reliable. No information is allowed to leak somewhere between the seeder and downloader, and the need for exit nodes should be eliminated as they are scarce. The purpose of this thesis work is pointed out in the following research question:

How to achieve a fully-decentralized architecture for anonymous downloading and seeding in BitTorrent?

To answer this question, a number of subquestions are formulated: 1. Which existing anonymity protocols are proposed in the past, and how do they evaluate? 2. How to map the ideas and concepts behind Tor hidden services into Tribler?

3. How to work around the connectability problem? 4. What is the performance impact of design decisions within this architecture?

A design specification for a proof of concept concerning a system answering the above mentioned research questions is contained in the upcoming chapters. 1.2. Document structure 3

1.2. Document structure 1 The contents of this thesis are as follows. The first chapter 1 starts with an intro- duction on this work, and an overview of research questions and motivation. Chap- ter 2 provides an overview of prior work on anonymity systems on the Internet. The problem description in chapter 3 describes the requirements on the anonymity system for downloading and seeding anonymously in a distributed network. The chosen approach with details about the protocol based on hidden services is de- scribed in chapter 4. In chapter 5 experiments on performance and fault resilience are conducted, combined with insights resulting from the experiments. Chapter 6 proposes improvements for future work on the anonymous system, and chapter 7 concludes this work with the results and consequences achieved on the proposed anonymity system.

2 Related work

Privacy may actually be an anomaly.

Vint Cerf

Making the Internet anonymous is hard to achieve, due to its nature of being public and open to everyone. In the past decades, various theoretical designs for anony- mous peer-to-peer systems have been proposed, but no established peer-to-peer application with a large user-base is currently using any of those designs. This chapter describes the basics of (2.1) and BitTorrent and its related terminology (2.2). Furthermore, it contains a describes the Tribler framework (sec- tion 2.3) in which a proof of concept will be implemented. Section 2.4 describes the existing implementation related to creating circuits and applying cryptography which is already covered in prior work. The chapter ends with an overview of ap- proaches of earlier anonymity systems in section 2.5.

2.1. Tor project Tor is an acronym, and stands for The onion router. It is a protocol designed for setting up anonymous connections on the Internet. The original design was first described in [12] by , as an improved version of the systems existing at those days. Their goal was to create a system in which privacy was guaranteed, and censorship is impossible. Compared to earlier Onion Routing systems, a number of improvements are proposed, like support for perfect forward secrecy, per-hop congestion control, and location-hidden services by using the concept of rendezvous points in a circuit.

5 6 2. Related work

2.2. BitTorrent BitTorrent is a distribution system for files over a network of peers. Peers are rewarded for donating bandwidth using a tit-for-tat mechanism. BitTorrent was described in 2003 by (see [5]). A static . is created, and 2 contains information about the files contained in the torrent, sizes per file, check- sum per file, and the URL of a tracker which helps to find peers in the swarm for the torrent. Each .torrent is assigned an info hash. This is a SHA-1 hash calcu- lated on the contents of the info-section of the .torrent file. The info hash ensures correctness of the downloaded files. Downloaded pieces where the hashes do not match, are rejected. This makes it impossible to change the contents of a torrent (e.g. for adding malware), because a change in files will change the info hash too.

Distributed hash table (DHT) BitTorrent uses a DHT, which is a decentralized and . It works like any other database, but is decentralized in its nature. Updates to this database are communicated as announcements, and the database synchronizes with peers near the client. The DHT is used for storing information about other peers in the swarm for torrents when no tracker is available. Each peer that uses the DHT becomes a kind of tracker. A DHT lookup can be performed by submitting the SHA- 1-hash of the info-section of the .torrent file. The DHT responds to this lookup-hash with addresses of peers in the swarm yielding the info hash.

Figure 2.1: Simplified example of a DHT lookup in a ring

Figure 2.1 shows a simplified ring example of the DHT. In this example, peers know their successor. When the actual node 1 wants to find a peer with id 15, it asks its successor (2). The successor returns the closest id it knows about, which is 4. This node will also return the closest successor it knows about, until 15 will 2.3. Tribler 7 eventually be found within a number of iterations. The DHT used by BitTorrent is based on the SHA-1-hash to retrieve peers by finding the closest peer via a binary tree representation of the hash. (PeX) 2 PeX is a on top of the BitTorrent protocol, to allow peers to collaborate with each other. A peer can ask other peers in the swarm for knowledge about peers they are connected to. PeX increases speed and efficiency, making the BitTorrent protocol more robust. For example, a peer exchange could consist of the addresses of new / disconnected peers that were recently updated. Most implementations of PeX are limited to a maximum of 50 peers per exchange, and do not peers more often than once per minute. One of the downsides of PeX is the need to find at least the first peer. This peer can be found via the tracker or via the bootstrap node of the DHT.

Magnet links Magnet links are special links, consisting of one or more parameters. They contain at least the (base-encoded SHA-1) info hash of a .torrent file, and additional infor- mation like tracker urls, or but not the torrent itself. Given this hash, peers can be found via DHT, the .torrent can be downloaded via one of those peers, and then start downloading the files contained in this .torrent file. Magnet links result in less bandwidth for the torrent websites. They are not a replacement for .torrent files, as the torrent contains essential information like trackers, hashes and files. An examle magnet link is magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C. The order of parameters is not significant, and the parameters are formatted as query strings like they are formatted in HTTP urls with GET parameters.

2.3. Tribler Tribler is a peer-to-peer file-sharing client, developed as a research project since 2005 by members of the Parallel and Distributed Systems at Delft University of Technology. Tribler uses the well-known BitTorrent network, and is improved and expanded with streaming video from magnet links, search functionality (via commu- nities), anonymous downloading and reputation-management (BarterCast). Vision and Mission as stated on the website emphasize the scientific research goal:

Push the boundaries of self-organising systems, robust reputation sys- tems and craft collaborative systems with millions of active participants under continuous attack from spammers and other adversarial entities.

Tribler differs from other clients in the way that all its features are fully distributed and decentralized, while staying compatible with the original BitTorrent specifica- tion. 1

1About Tribler - http://www.tribler.org/about.html, see [26] 8 2. Related work

2

Figure 2.2: Screenshot of Tribler v6.3.3

Dispersy communities Some extra features of Tribler, like reputation-management or setting up anony- mous circuits, require more information about nodes in a swarm. Dispersy is de- signed to solve this problem. It is an elastic database system, capable of sending messages around in a decentralized peer-to-peer network. Groups of nodes can be clustered into a so-called community, where a database can be shared by having peers sending messages around in the community. Communities are designed to discover new candidates automatically, and introduce candidates with other candi- dates. From time to time, candidates are replaced in favour of new candidates.

2.4. Anonymity in Tribler In late 2013, a first attempt towards anonymity in Tribler was implemented 2. The current version of Tribler offers full support for anonymous downloading over cir- cuits. The main parts and ideas behind the Tor protocol are ported to Tribler. Circuits can be built over the UDP protocol, NAT traversal is supported, and there is end-to-end congestion control which makes circuits fast enough to make streaming video possible. Key part of the idea is that every Tribler peer may become a relay in a circuit. Someone who wants to download a file anonymously, looks for other peers in the dispersy tunnel-community. Seven cell types are required for setting up a circuit:

CREATE is send to the the last hop in the circuit. It asks for a list of neighbours

2https://github.com/Tribler/tribler/wiki/Anonymous-Downloading-and-Streaming-specifications, see [27] 2.4. Anonymity in Tribler 9

to which the circuit can be extended.

CREATED acknowledges the CREATE message, with a list of public keys of neigh- bours to extend to. EXTEND is relayed to the last node in a circuit, and tells this node to extend to the 2 neighbour contained in the payload.

EXTENDED acknowledges the EXTEND message.

DATA for sending data messages over the circuits.

PING messages are relayed over the circuit, and keep the circuit alive (as UDP is connectionless), it helps to detect broken down circuits.

PONG acknowledges PING messages, and are relayed down in the circuit.

More information about the establishment of the protocol and circuits can be found in prior work [33].

Encryption layers and handshake In an onion routing circuit, messages sent over the circuit are (partially) encrypted. The current circuit creation in Tribler is based on public key Diffie Hellman Elliptic Curve Elgamal encryption. For each hop in a circuit, an extra encryption layer is added by encrypting the message with the public key of the last extended node. The only unencrypted part in a message is the circuit-id, which is necessary for hops in between to give a clue which circuit belongs to which message. All hops in between can’t decrypt the contents without knowing the secret key of the last node. Each hop is only capable of decrypting the layer of the messsage that is encrypted with their own public key, and will relay the message onwards on the circuit, until the exit node is reached. After decrypting at the exit node, the original message is readable again. A hop shouldn’t know its role in a circuit, it only knows its predecessor and successor in the circuit. The only exception is the exiting hop. This hop can see what data is sent through the circuit, and will be considered as the originating source for the outside world when data is exited on the Internet. Every candidate returned by the dispersy community generates a long term curve25519 key (푏, 퐵), and a unique identifier on startup. Those identifiers are persistent, and shared through the community, to inform other peers about their existence and ability to become part of a circuit as a relay or exit. Diffie Hellman secret 푔 mod 퐺 are calculated for each hop in the circuit. ECElgamal code uses a 푘 which is relatively prime to the modulus of the curve. A handshake similar to ntor with AES_GCM is used for encrypting packets. In figure 2.3, a schematic overview is included which shows how a circuit is used to connect to a service over a SOCKS5 server via 3 hops. The exact working of the cryptography is beyond the scope of this thesis. An initial write-up on the used cryptography is described in prior work [28], although the code is fundamentally rewritten by Niels Zeilemaker after the Christmas 2014 criticism (see https://github.com/Tribler/tribler/ issues/1066). 10 2. Related work

2

Figure 2.3: Encryption and decryption in an onion routing network with 3 hops

2.5. Overview of research on anonymous communi- cation Many research on anonymous communication had been conducted in the past decades. The common unit playing an important role in each proposed system is the relay, separating the source and destination. This section describes briefly the idea behind anonymity protocols evolving through time. Note that most of the protocols never got beyond a prototype implementation (if at all), only a few (e.g. Tor, ) are actually deployed as fully-functional anonymity software for the mass.

Chaumian Mix Network In 1981, David Chaum proposed the first design of an anonymous communication network, called the Chaumian Mix network [3]. The sender of a messages encrypts the body and corresponding recipient of the message, and sends the message to a server afterwards. The server obfuscates and reorders the messages (this is called mixing), making it impossible for outstanders to know who the sender of a particular message is. The process of mixing is repeated a number of times, until the message is delivered at the recipient, where it is fully decrypted.

Mixmaster Mixmaster [7] (1994) is a centralized anonymity system specification for a secure transfer protocol for protecting electronic mail against traffic analysis. It is based on Chaum’s mix-net protocol. The set of mixes is maintained by a single trusted authority. An email message is forwarded through a sequence of remailers (mixes), where each remailer forwards the message encrypted with public key cryptography. It is hard for an adversary to observe where a message comes from, where it is going to, and what the contents of the message are. 2.5. Overview of research on anonymous communication 11

Babel Babel [18] (1996) is similar to Mixmaster, but supports the specification of a return- path to allow bidirectional communication. The return-path consists of the return- address of each of the mixes to be used in the path. Afterwards, each return- address is encrypted with public key encryption. Babel mixes are not aware of each other, and can’t learn from the messages they process. 2

Mixminion Mixminion was proposed as an extension of Mixmaster in 2003 [9], by adding single- use reply blocks to messages. The original design of the Chaumian mix network supports reply blocks to be sent back to the sender. But these reply blocks are reusable, thus subject to a replay attack with which the route to the origin of the message can be retrieved by traffic analysis. Mixminion solves this problem by making reply blocks indistinguishable from regular messages in the network. This results in a system where both sender and receiver anonymity is ensured, and perfect forward secrecy prevents adversary mixes to manipulate any message.

DC-nets DC refers to the Dining Cryptographers problem, where multiple participants in a network perform a secure computation using the boolean OR-function. This prob- lem is proposed by David Chaum in 1988. The best way for illustrating this problem, is by quoting the example proposed by David Chaum himself: Three cryptographers are sitting down to dinner at their favorite three- star restaurant. Their waiter informs them that arrangements have been made with the maitre d’hotel for the bill to be paid anonymously. One of the cryptographers might be paying for the dinner, or it might have been NSA (U.S. National Security Agency). The three cryptographers respect each other’s right to make an anonymous payment, but they wonder if NSA is paying. They resolve their uncertainty fairly by carrying out the following protocol: Each cryptographer flips an unbiased coin behind his menu, between him and the cryptographer on his right, so that only the two of them can see the outcome. Each cryptographer then states aloud whether the two coins he can see - the one he flipped and the one his left-hand neighbor flipped - fell on the same side or on different sides. If one of the cryptographers is the payer, he states the opposite of what he sees. An odd number of differences uttered at the table indicates that a cryptographer is paying; an even number indicates that NSA is paying (assuming that the dinner was paid for only once). Yet if a cryptographer is paying, neither of the other two learns anything from the utterances about which cryptographer it is. A DC-net is an anonymous communication network based on this problem. [2]. Participants can publicize message in an anonymous way, and adversaries can’t see who publicized a message. DC-nets are not efficient because only one user at 12 2. Related work

the time is able to send messages, but DC-nets provide both sender and receiver anonymity.

XOR-Trees 2 XOR-Trees where proposed in 2000 [13] as a variant of a DC-net where com- munication takes place over a spanning tree. It provides both sender and re- ceiver anonymity. XOR-Trees have a low performance and lack a trade-off between anonymity and communication efficiency, resulting in unscalability. Because only a single user in the network is able to send messages at the same time. When the number of users increases, the number of collisions in the network also increases.

푃 Peer-to-Peer Personal Privacy Protocol (푃) is a decentralized and scalable anonymity system and provides both sender- and receiver anonymity [31]. It is based on the original ideas of XOR-Trees and DC-nets. A sender sends messages to a channel, which contains an addressable subset of groups. Secure and anonymous connec- tions are established through a spanning tree of broadcast channels. Each user is able to trade-off anonymity with efficiency. Unfortunately 푃 is not perfect. It is not feasible for high rates, and communication efficiency decreases when new users join a channel.

Tarzan Tarzan [15] (2002) is a decentralized peer-to-peer anonymous IP network over- lay. It is general purpose and transparent to applications communicating over the network. Tarzan provides sender anonymity, but no receiver anonymity. The sys- tem consists of sets of neighbours, where neighbouring nodes are nodes with IP addresses in the same subnet. To cover traffic, a continuous stream of dummy messages is transferred between subnet groups. Anonymous circuits are built by repeatedly choosing a node from the neighbouring set. On each hop key-pairs are generated to be used for communicating over the circuit. Tarzan uses a gossip protocol to share information about other nodes through the network.

MorphMix MorphMix (2002) is an anonymity mix-net system based on Tarzan [30]. Each user in the network is a mix. Sending data is performed in exactly the same way as relaying data, thus forms a global anonymity set. MorphMix learns about the existence of other peers in the network from previous sessions, through gossipping, and peers that advertize theirselves. No central authority exists for managing peer discovery. MorphMix differs from Tarzan in the way of achieving client anonymity. The last hop in a sequence does not extend its sequence by selecting the next relay itself, but extends by asking a trusted neighboring peer for the next relay (which makes this neighbor a witness for relay selection). One problem with the design of MorphMix is the possibility where cliques in the sequence of mixes occur. MorphMix is resistant to some sorts of collisions but not to all. 2.5. Overview of research on anonymous communication 13

AP3 AP3 was proposed in 2004, and stands for Anonymizing Peer-to-Peer Proxy [23]. It is a cooperative providing anonymous communication services for its users. It uses DHT lookups to locate relays for anonymous communication. Messages in AP3 are anonymously delivered through a random path in the network. Each node has its own authenticated pseudonymous identifier to make identities 2 in the network secure and anonymous. AP3 supports channels for setting up a persistent and anonymous connection circuit between nodes.

Freenet [4] was proposed in 2001 as a system for building networks and of anonymous users, without a central authority. Anonymous routing is provided by a variant on Chaumian mixes in a redundant tree-based structure. Resources on the network are stored on random nodes, and locating these resources is achieved by issuing queries that are relayed through the network to nodes closer to the resource. Relay nodes do not have knowledge about the issuer of a query.

ECRS ECRS is a content encoding scheme implemented in GNUNet to achieve secure, anonymous and censorship-resistant peer-to-peer content sharing network. Users can share data and queries over the network in an encrypted way. Each node in the network contributes storage to the GNUNet network. Each node maintains an identification number, representing the location where a content part is stored, and is connected in a strictly structured way to other close nodes. [17]

PipeNet PipeNet was proposed in 1996 as a low-latency anonymity network [8], with some features of the mix-net protocol in common. Connections in the network are es- tablished by selecting a path in the network, consisting of a sequence of nodes. The initiator is called the caller, and the endpoint of the path is the receiver. Each node in between is called a switch. Each node in the path shares a public key with the caller, and knows how far it is away from the caller. PipeNet provides sender- anonymity, but no receiver-anonymity. After establishment of a path through the network, anonymous communication is possible. The caller start a constant stream of same-sized packets to the first switch in the path. This switch will add a sequence number to the packet, encrypt the packet with the associated public key, shuffle the order of received packets, and forwards these packets to the next switch in the path, repeating the procedure. When the packet arrives at the receiver, it will decrypt and unshuffle the packets.

Onion routing Onion Routing (1999) [16] is a distributed overlay network, and makes anonymous communication possible by building up a circuit through a network of relaying nodes. Data sent over the circuit is encrypted with the public key of each hop in the circuit. Each hop decrypts its own layer (like peeling layers of an onion) and relays the 14 2. Related work

message down to the next node in the circuit. When the message arrives at the exit-node, the message is fully decrypted. Because each relay node only knows its predecessor and successor, and does not have knowledge about the data it is relaying, identity of other nodes, or its location in the circuit, anonymity is ensured. 2 Tor Tor [12] (2003) achieves anonymity by building circuits over multiple hops, the data sent over the circuit is per-hop encrypted. This makes it very unlikely that someone can eavesdrop unencrypted data sent over the circuit, or modify any data without being detected. Above all, it does not reveal anything about the origin of the data, because the circuit works on multiple hops. Each hop in a circuit only knows about its own upstream and downstream neighbour, and the origin knows a shared secret with each hop. All data sent over the circuit is encrypted and decrypted with each hop, like peeling and adding layers around an onion. The detailed design specification of Tor is described in [10].

PIR-Tor PIR-Tor was introduced in 2011 to address the scalability problem in Tor [24]. Clients use Private Information Retrieval to obtain a subset of relay nodes, elim- inating the need of obtaining the entire set of available relay nodes as Tor does. On downloading a small set of descriptors for a circuit, only the client knows about these descriptors, thus making route fingerprinting impossible. Clients anonymity is furthermore preserved by following the same techniques Tor uses for building circuits.

Salsa Salsa [25] (2006) is a peer-to-peer anonymity system, and works in a similar way like Tor does. It constructs a circuit of random chosen relay nodes. Salsa uses a custom distributed hash table to find these relay nodes for anonymous commu- nication, and uses redundancy checking mechanisms to protect against bias and adversary attacks in the network. But this redundancy checking leaks information about the lookups, and is vulnerable to a denial of service attack.

MORE MORE (2007) is a centralized anonymity system, which stands for dynamic Multi- path Onion RoutEr for peer-to-peer overlay networks [22]. The approach of MORE extends the static onion routing paradigm by allowing packets to travel along dif- ferent paths through the network. Anonymity is guaranteed by letting the sender and receiver independently select the hops for the first part and second part of the path. A central trusted service directory provides the lookup service for discovering nodes and path sections leading to a hidden service. 2.5. Overview of research on anonymous communication 15

I2P I2P is an acronym for The Invisible Internet Project, and is based on a private network with anonymous websites. These websites can only be reached by users within the I2P network. It is an anonymous network layer, and designed in such a way that other software is able to communicate over the network. Applications may send and receive messages anonymously and securely. Downside of I2P is 2 that it not suitable for massive data transfer, and the network is private, so it can’t connect to services on computers not running I2P. This is a showstopper when it comes to file-sharing over BitTorrent, because nothing can be downloaded when there are no seeders around within the small network of I2P users. [34]

3 Problem Description

It’s very difficult for governments to dominate the Internet because it’s so difficult to control. People want to be free. People want to hear multiple voices. They want to make their own decisions. And people who see things will report things. Eric Schmidt

In the previous chapter on related work, different protocols and platforms have been described. Working towards a file-share system based on BitTorrent with security, privacy and anonymity as key features is the goal of this work, and requires a perfect combination of the ideas behind different platforms working together. This chapter describes the problems faced when combining the decentralized behaviour of BitTorrent and DHT with the anonymity provided by the centralized Tor to make a real anonymous peer-to-peer experience over the BitTorrent protocol possible.

3.1. Requirements on finding peers In a global network with peers spread around the world, finding peers to connect with or to relay data to is a hard but essential task. All peers in the network should somehow be able to find each other, without the need to reveal their identity while finding peers.

Finding seeding peers without a central authority Tor is supposed to know about every node in the network. For the purpose of hidden services in Tribler, there is no need to do this. Just a sufficient amount of candidates for initiating the circuits is necessary. Those circuits will be built by candidates through the dispersy tunnel-community. Each candidate within this community may become part of a circuit. No central directory server is needed in

17 18 3. Problem Description

contrary to Tor, where the code contains some pointers to central directory servers for bootstrapping purposes. Taking down the Tor network is as hard as taking down those hard-coded servers, they can be found in config.c 1, see Figure 3.1. Tribler used to have also a number of fixed servers in its code, but recently the DiscoveryCommunity class in Tribler takes care of this rendering the bootstrapping dispersy servers useless. The only way to take Tribler down is by taking the Internet down. 3

Figure 3.1: Fixed pointers in sourcecode of Tor to service authorities.

Leading libraries such as support DHT announces. The DHT is a key- value based distributed storage. When a peer announces, it notifies other peers in its routing table about the fact that it owns the torrent identified by the announced info hash. Based on the ’closeness’ of the info hash in the routing table of peers, other peers may query the DHT to find peers for a given torrent info hash. The DHT in its traditional form does not work in a based on hidden services. This is because peers in a darknet are not actually the peers seeding a torrent, they are just introduction points used for introducing a downloader to a seeder over an exit socket. The introduction peer is not allowed to be involved in the end-to-end circuit to be established between the downloader and seeder later on.

1See https://gitweb.torproject.org/tor.git/tree/src/or/config.c 3.2. Boosting performance 19

Offering PeX like functionality Peer Exchange (PeX) functionality offers a higher chance of discovering all the seed- ers in a swarm combined with DHT. Peers exchange peers they know about to each other. The PeX implemented in torrenting libraries can’t be applied directly onto a darknet based on hidden services, with the same reasons as the DHT has: the introduction points are the only entry-points to connect to seeders. Furthermore another great difference in the working of PeX exists: a seeder can’t find other seeders directly as it does not maintain connections to other seeding peers. To get high performing experiences using hidden services, a good peer exchanging mechanism is essential, given that DHT does not yield enough peers compared to 3 traditional torrent downloading.

3.2. Boosting performance The Tor software offers anonymous communication, but the problem with Tor is that it is too slow, average speeds for downloading are around 200 kilobytes per second, which is by far too slow for practical use of BitTorrent and for streaming video.

Blocking Peer-to-peer systems depend on a pool of peers where all participants are incentivized to share resources in a tit-for-tat fashion. All peers need to help with relaying data over the onion network. To overcome free-riding behavior, peers should be blocked for downloading more data from the com- munity than they shared resources to the community.

LEDBAT The BitTorrent protocol uses the LEDBAT algorithm for congestion control to transfer data very fast throughout the network. When data is relayed through tunnels, the latency will increase. But latency should be limited as much as possible.

Benchmark One of the first implementations of the tunnel-community in Tribler was able to achieve a bandwidth of 5 MByte per second with cryptography disabled and running all relays on a single machine. Although this is not a practical situation, it’s a good benchmark for comparing bandwidth in real- world applications.

3.3. Resilience to attacks An anonymous system should be resilient to attacks. Various attacks are possible depending on the objective of the attacker. Those attacks can be either passive (by observing) or active (by participating). This section describes a number of attacks where a mature anonymous system should be resilient to.

Sybil Attack and masquerading A peer to peer system is susceptible to Sybil attacks, due to the lack of a cen- tral authority [14]. A peer to peer network is self-organizing and peers rely on each other for all information. In a Sybil attack, some hostile peers in the network 20 3. Problem Description

counterfeit multiple identities. Peers can’t detect or ask some central identification authority whether a remote identity is valid or compromised. One way for dealing with this problem is to coordinate the validation of identities throughout the system by somehow collectively maintaining a trusted source of valid identities. This so- lution increases the amount of network traffic, and only works on the assumption of a large-scale distributed system where most of the peers are loyal, and com- promised peers do not have sufficient resources for influencing the trusted source. Several ideas how such a fully decentralized system for a trusted source should look like have been proposed, but none of these ideas moved beyond the proposal and 3 simulation stage. One possible solution for preventing a Sybil Attack is to apply a reputation mech- anism for building up a web-of-trust. BarterCast is a reputation mechanism, where peers propagate information about other peers’ behavior through gossiping. And to ensure propagated information is valid throughout the network, a block-chain for saving the information would be valuable.

Eclipse Attack An adversary node can propagate false information about other peers in the net- work. For example by introducing wrong information about peers in the network, by dropping information relayed through them, or by sending bogus data through the network. The Eclipse attack is based on these kind of Byzantine faults and Byzantine failures caused by adversaries.

Traffic confirmation attack Because the anonymous features are performed with high upload and download rates, the anonymity is vulnerable to a traffic confirmation attack, where adversaries use statistics about the amount of data that’s been transferred through a circuit. When both ends of a circuit are watched and the traffic on both edges is equal, it is very likely that both ends of a circuit are communicating. A trivial solution for solving this problem is to have every peer to send cover traffic over the circuits. But this costs an enormous amount of bandwidth to the volunteering relays and exits, and is therefore not worth the price. And even in the case of cover traffic the attack is possible (only harder) by blocking the padding for periods and look for patterns afterwards.

Correlation attack The exchange of keys for the Diffie Hellman handshake is subject to a correlation attack. This attack can be applied on any encryption method that is based on a stream cipher. An unencrypted cell consists of a stream of message symbols, and each symbol is individually encrypted. The unencrypted plain-text message is 푚, the stream cipher takes the secret key 푘 and encrypts 푚 into a key-stream 푧. Basically, if someone is able to intercept both the unencrypted stream 푚 and encrypted stream 푧 from the network, a plain-text attack can be applied. The idea behind this kind of attack is to find a correspondence between the plain-text 푚 and encrypted 푧, which can be used to determine a key 푘. When the unencrypted cell 3.4. Acquiring required keys for encryption 21 is not known, a correlation attack is still possible but much harder. Ways to find a correlation between the encrypted message 푧 is to assume one or more block- parts known of an encrypted message. A key can be guessed transforming the unencrypted known text into the encrypted cell. This key can be used to reveal the remaining plain text parts contained in the rest of the encrypted message blocks.

Timing attack Private key operations for the Diffie Hellman encryption consist of computing 푦 ⋅ mod 푛, with 푛 is the public key, and 푥 is the private key. 푦 can be found by eavesdropping the network. For the timing attack, the private key can be derived 3 statistically from the time needed to compute several 푦 ⋅ mod 푛 known to the at- tacker. Some implementations of Diffie Hellman are vulnerable, but it is unlikely to work in an environment of circuits where message are relayed down, since no ac- curate time measurements are available. The encryption layer on each hop makes it even harder to attack. [21]

3.4. Acquiring required keys for encryption One of the major problems in implementing hidden services in the existing BitTor- rent network is the lack of information about the public keys from introduction points and service key of the seeder. In the following sections a number of solutions are proposed. In the case of establishing an end-to-end encrypted circuit between two peers, a circuit needs to be created between the downloader and introduction point of the seeder. But for applying cryptography on this circuit, the public key of the seeder needs to be known. In the original Tor design, this problem is circumvented by asking so-called Hidden service directories (HSDirs) for all keys, but no such a central server exists in a fully decentralized network of BitTorrent peers.

3.5. Avoid exit nodes When Tor-like circuits in Tribler are used for anonymous downloading, exit nodes are the final relay, end-points of circuits. Exit nodes blame for all the traffic passed through a circuit, because the outside world sees the IP address of the exit node as the source of the traffic. In Tor, exit nodes are known for trafficking all kinds of content, including malicious, objectionable and illegal content. Release 6.4 of Tribler was released in December 2014, and had the anonymous downloading func- tionality enabled. With this version, everybody running the Tribler application may become an exit node for someone else’s anonymous downloads without knowing this. Shortly after the release posts appeared on various blogs and fora where peo- ple suffer from lawyer-based attacks and police raids. Although the lacking security was not supposed as a scientific approach, it gave good insights on the effects of failing anonymity in the community of Tribler users, lots of users got DMCA warn- ings and some even experienced law enforcement raids with guns at their homes. To protect Tribler users in the future from those kinds of attacks, privacy should be the default, and users may never become exit-node without their knowledge or approval. Downloading torrents should only be allowed via hidden seeding services 22 3. Problem Description

or via exit-nodes that explicitly allow being an exit-node, furthermore traditional unprotected seeding will be disabled entirely. Although Tribler cannot guarantee to protect its users against spooks and government agencies, it should come very close in almost every case.

3.6. Connectability Problem Another major issue with hidden services is to work around the connectability prob- lem on connecting to a hidden service. This problem is caused by Internet users 3 behind a or NAT. These users are by default not connectible, some ports are closed, thus making it impossible to connect from an outbound address. See the example in figure 3.2: if B is able to communicate to both A and C, it is not necessarily true that A and C can also communicate with each other, since C is behind a NAT and thus not connectible.

Figure 3.2: Example of the connectability problem

NAT Puncturing A workaround for the connectability problem is to use NAT puncturing. This method is used to ‘puncture a hole’ through the NAT, enabling communication with an un- connectable node [19]. A node that’s already connected to the unconnectable node can be asked to introduce itself to the unconnectable node. When the un- connectable node got introduced to the outbound node, literally a hole is punc- tured through the NAT, and the unconnectable node is connectable for this par- ticular node. In Tribler a method for NAT puncturing is implemented. The candi- date walker in dispery introduces new peers to unconnectable peers, and because the unconnectable peer takes the initiative to connect, holes are punctured in the NAT to allow communication between unconnectable peers. In the case of hidden seeding, where a downloader tries to connect to a particular introduction point or rendezvous-point, the introduction point needs to be selected as a connectible end- 3.7. Handling churn 23 point of a circuit somehow, otherwise the introduction point is useless and nobody is able to connect to the hidden service over the introduction point.

3.7. Handling churn In the context of peer-to-peer systems, churn is defined as the dynamics of peer participation in the network, thus arrival and leaving of peers [32]. This is a critical part for correct working of hidden seeding services, as a peer that leaves, also causes all the circuits where it is participating in will break down, and in the case when the peer is an introduction point, a seeder can’t be reached anymore, while the 3 introduction point is possible still being announced at other places on the network. The regular Tor network also suffers from this issue. The DHT and exchanged information about peers in the network should be resilient in such a way that the effect of leaving peers is limited to a minimum, by having the seeder creating new introduction points to maintain a stable network. In essence, there is not much difference with traditional BitTorrent here, as this also suffers from leaving peers. The only exception is for security limiters when circuits are automatically breaking down after a fixed amount of time. The DHT should be able to keep up with such changes.

3.8. Banning corrupt peers Tribler uses libtorrent for interaction with the BitTorrent network. The library is written in C++, and supports torrent sessions over TCP, UDP and 휇TP. In a peer- to-peer community, everyone is able to send data - including corrupt data. [1] Integrity of blocks of data are checked against the SHA-1 hashes defined in the original .torrent file. Peers sending data that fails these integrity checks need to be banned. Various solutions for banning peers are applied in Libtorrent. ratio ban This ban counts for each peer how much pieces succeeded, and how much pieces failed. A peer is dropped when the ratio between successes / failures triggers a threshold. smart ban When a piece fails, the peer is logged with the hash of the correspond- ing blocks. Other peers sending a block of which the hash is not matching the content are immediately banned. After successfully downloading a piece, all logged peers that sent invalid blocks are banned.

4 Design and Implementation

It’s the Industrial Revolution and the growth of urban concentrations that led to a sense of anonymity. Vint Cerf

The main issue with classical BitTorrent swarms is the lack of protection. The iden- tity of peers within a swarm is literally exposed to everybody. In almost every BitTorrent client it is possible to view the list of connected peers with their IP ad- dresses. Hidden seeding services would solve this issue and ensure strong privacy within the darknet. All inbound and outbound connections should use onion routing circuits. They preserve full anonymity because connections are encrypted from end- to-end, thus there are no exit nodes that may be subject of being eavesdropped. Hidden services ensure that people who offer controversial video material are pro- tected online, their identity remains hidden. The design specification described in this section is mostly based on the ideas and excellent work of the people behind the Tor Project [11]. Tor Hidden ser- vices are the leading solution for anonymous web-hosting, but unsuitable for video streaming - like YouTube - because it is too slow. Tor also depends on a number of ‘trusted’ central directory servers, whereas Tribler is fully decentralized. The Tribler approach uses the UDP protocol and does not rely on central directory servers like Tor does. It prefers downloading from hidden seeders using end-to-end encryption, but for achieving high performance and speed also the classical downloading from exit-nodes is enabled. This choice is made on purpose, because in the beginning anonymous seeding will have zero or just a few seeders and suffers from a bootstrap problem due to the lack of anonymous seeders. While hidden seeding is limited by the peers within the darknet, anonymous downloading over exit-nodes supports downloading from the traditional public seeders. As long as this is the case, there is no reason for disabling downloading over exit-nodes. Traditional ‘naked’ seeding

25 26 4. Design and Implementation

will be disabled by default, all seeding downloads in Tribler are automatically trans- formed into hidden seeding downloads after they finish downloading, no matter whether the download proceeded anonymously or not.

4.1. Integration into Tribler

Starting in June 2014, Rutger Plak and Risto Tanaskoski finished their thesis work on building a P2P onion router in Tribler. This work consisted of 11000 lines of code in 702 commits. In the summer, an intensive code refactoring took place in the core of Tribler, and the tunnel-community was rebuilt from scratch by Egbert Bouman. In September the idea of hidden services was integrated into the tunnel- 4 community in Tribler. The release of Tribler 6.4 in December 2014 brought up major problems related to the cryptography used in the anonymous downloading over proxies, see https://github.com/Tribler/tribler/issues/1066. Niels Zeilemaker replaced most of the broken and experimental code, and refactored the tunnel-community and hidden-services again. From this moment on, work was continued to make the tunnel-community and hidden-services production ready until the release of 6.5 in July 2015. All improvements to build a production ready hidden services were realized in a total of 30 pull requests consisting of 70 commits with a total of around 4000 lines of code in the Tribler and Dispersy repositories on GitHub. Another 1000 lines of code were involved in 2 pull requests for setting up a clear experiment framework for circuit related experiments using a fully functional Tribler session in Gumby.

Figure 4.1: June 29th 2015, an example of end-to-end downloading of a torrent with speeds of 1.9 MB/s 4.2. Decentralized discovery of hidden seeders 27

4.2. Decentralized discovery of hidden seeders In the original Tor design, central directory servers called HSDirs are used for re- trieving information about a hidden service, like the service-key and public keys of the introduction points. In a peer-to-peer environment such a central server does not exist, the protocol works in a fully decentralized network of BitTorrent peers. Our solution for retrieving the essential information to connect to a hidden seeding service is by adding an extra message in the protocol: key-request. When an anony- mous torrent got announced in the DHT, the introduction point should be asked for e.g. the public key of the session of the seeder by means of a key-request message. Both the seeder and downloader use circuits for this key requesting mechanism, but with the current implementation the introduction point on itself knows which info hash is shared and what the rendezvous point will be. This is a known weak- ness in the protocol, but is to be solved later on in future work, when opportunistic 4 encryption in a web of trust is reality.

4.3. Circuit setup The introduction points and rendezvous point for downloading over hidden seeding services should always be connectible, to allow a downloader to build a circuit and connect to the introduction point of the seeder, and to allow the seeder to build a circuit to the rendezvous-point of the downloader. The current approach in the tunnel-community is to require every exit-node in the network to be connectible. In this case, there is no doubt about the connectability of the introduction point, as the introduction point itself is in fact an exit-node of a circuit initiated by the seeder. This also solves the connectability problem for the rendezvous point, as the rendezvous point is in fact an exit-node of a circuit initiated by the downloader. Solving the connectability problem for the introduction and rendezvous points is not essentially the same: the introduction point always needs to be connectible for strangers, but an unconnectable rendezvous point can be punctured by letting the hop that needs to connect to the rendezvous point propagate its identity back to the seeder, via the circuit to the introduction point relayed to the downloader, than propagated to the rendezvous-point which on its turn sends a dispersy puncture to the last hop. This will only work for the rendezvous point because there is already an existing circuit around. For the introduction point it will always be necessary to be connectible for the outside world.

4.4. Dispersy message cells Most message cells consists of a circuit_id and identifier in their payload, these are required to identify to which circuit a particular message belongs, and for acknowl- edgement of messages. The circuit_id field is 4 bytes long, and identifier field is 2 bytes long. The following sections describe a scenario where Bob owns some files, and wants to share them via peer to peer over the BitTorrent network. Alice is interested in downloading this file. 28 4. Design and Implementation

4.4.1. Setting up Introduction Points In preparation of seeding files over the BitTorrent network, Bob builds up at least one anonymous circuit to let another node serve as introduction point for his seeding services. The original info hash of the torrent is prepended with a string ‘tribler anonymous download’, and the SHA1 hash is calculated over this string, resulting in a modified info hash to be used for finding hidden services. For each file Bob is seeding, a unique keypair is generated and stored in the session. By sending an establish-intro message with the modified info hash of the torrent file to the last hop of a circuit, this last hop becomes an introduction point for the modified info hash. The payload byte format of the establish-intro message is shown in figure 4.2.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 circuit_id identifier info_hash info_hash

Figure 4.2: Byte format for payload of the establish-intro cell

After receiving an establish-intro message, the introduction point responds with an intro-established acknowledgement message back to the seeder. The introduc- tion point will also announce the torrents’ info hash on its dispersy port into the Mainline DHT. This way, the DHT can be queried to return introduction points for a given info hash. The intro-established message does not have any additional payload. The payload is shown in figure 4.3.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 circuit_id identifier

Figure 4.3: Byte format for payload of the intro-established cell

4.4.2. Finding peers to download from When Alice knows the info hash of the torrent file seeded by Bob, she can calculate the modified info hash by prepending ‘tribler anonymous download’ and calculating the SHA1 hash on this string. By querying the DHT for the modified info hash, she finds Bob in the list of introduction points. A direct DHT lookup by Alice reveals her interest in the torrent file and leaks her privacy. To prevent this leakage, Alice asks for peers over any circuit from the pool. She sends a dht-request message with the modified info hash over this circuit. The byte format of the payload is shown in figure 4.4.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 circuit_id identifier info hash info hash

Figure 4.4: Byte format for payload of the dht-request cell 4.4. Dispersy message cells 29

The last hop receiving the dht-request cell will do a DHT lookup for the info hash, and the returned peers are packed into a dht-response cell. The byte format of the payload is shown in figure 4.5.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 circuit_id identifier info hash info hash ... encoded list of peers ...... encoded list of peers ...

Figure 4.5: Byte format for payload of the dht-response cell

When Alice receives the dht-response message, she will likely find the introduc- tion point chosen by Bob in the list of peers. 4

4.4.3. Getting keys When Alice knows the location of the introduction point of Bob, she needs to know the session key of the info hash seeded by Bob. She sends a key-request message to the introduction point. The introduction point will forward this message to the seeder. The byte format of the payload is shown in figure 4.6.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 identifier info hash

Figure 4.6: Byte format for payload of the key-request cell

Note that the circuit-id is omitted in this cell. This is by design, the key-request cell is exited as a dispersy message over the exit socket of an existing circuit with the introduction point as its final destination. Therefore no circuit is involved, as the introduction point does not receive the message over a circuit. The byte format of the payload stays the same after the relaying by the introduction point. The only difference in payload is the cache identifier which is replaced by an identifier pointing to the relay message cache of the introduction point. On receiving the key-request cell, the seeder replies with his session key for the info hash in a key- response cell. This cell contains the session key of the seeding torrent chosen by the seeder, and an encoded list of other peers and keys the seeder already knows about. The byte format of the payload is shown in figure 4.7.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 identifier |sesskey| ... session key ...... session key ...... exchanged peers and keys ...... exchanged peers and keys ...

Figure 4.7: Byte format for payload of the key-response cell 30 4. Design and Implementation

When Alice receives the key-response message, she has all the necessary infor- mation to create an end-to-end encrypted circuit to Bob, and if Bob appended more peers to the key-response, she is able to initiate more end-to-end encrypted circuits to other seeders if Bob supplied some exchanging peers in the payload (analogous with PeX).

4.4.4. Create end-2-end Alice sends a create-e2e message over a circuit that will exit over the exiting socket into the introduction point of Bob. The byte format of the payload is shown in figure 4.8.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 identifier info hash ...... info hash |푝푢푏푘푒푦| |푘푒푦| node-id ...... node-id pubkey ...... pubkey ... key ...

Figure 4.8: Byte format for payload of the create-e2e cell

If the introduction point recognizes the info hash received by the create-e2e, the message is relayed onto the introduction circuit leading to Bob.

4.4.5. Requesting end-to-end connection Bob establishes a rendezvous circuit to a rendezvous point (RP). This circuit is re- quired to have a connectable hop at the end of the circuit, as the rendezvous point is required to accept inbound connections from the downloaders’ circuit. After build- ing the circuit, an establish-rendezvous message will be send over this circuit to the last hop, with a random chosen single-use rendezvous-cookie as payload. The rendezvous point is now waiting for an inbound connection with a valid cookie, to link the end-to-end circuit. The byte format of the payload is shown in figure 4.9.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 circuit_id identifier cookie ...... cookie

Figure 4.9: Byte format for payload of the establish-rendezvous cell

After receiving the establish-rendezvous message, the node is marked as a ren- dezvous point, it will associate the circuit it is connected to the received rendezvous- cookie. It will then reply with a rendezvous-established message back to Bob. The byte format of the payload of this message is shown in figure 4.10. 4.4. Dispersy message cells 31

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 circuit_id identifier ip address port

Figure 4.10: Byte format for payload of the rendezvous-established cell

When Bob receives the rendezvous-established, he can acknowledge the create- e2e message with a created-e2e message. This message contains all the informa- tion needed by Alice to build a circuit ending in the rendezvous point chosen by Bob. It is sent over the circuit ending in the introduction point of Bob. The introduction point will look up the corresponding circuit identifier in the payload, and relay the message downwards into the exiting socket from the exit-circuit initiated by Alice. The byte format of the payload for this message is shown in figure 4.11. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 identifier |푘푒푦| cookie ...... cookie ... key ... key rendezvous sock

Figure 4.11: Byte format for payload of the created-e2e cell

4.4.6. Linking end-to-end connection When Alice receives the Alice builds a new circuit ending at the rendezvous-point of Bob, and sends a link-e2e message along this circuit. The byte format of the payload is shown in figure 4.12.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 circuit_id identifier cookie ...... cookie

Figure 4.12: Byte format for payload of the link-e2e cell

When the rendezvous point receives a link-e2e message and the rendezvous- cookie provided in the payload is the same as the cookie that the seeder commu- nicated earlier, the two circuits are combined into 1 circuit, where inbound and outbound data is relayed into the circuits replacing the exit sockets. The linking of the circuits is acknowledged by a linked-e2e cell back to Alice, without additional payload. The byte format of the payload is shown in figure 4.13.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 circuit_id identifier

Figure 4.13: Byte format for payload of the linked-e2e cell

When Alice receives a linked-e2e message, the handshake is completed. Alice initiates the downloading via a libtorrent session that gets its data over the circuit, 32 4. Design and Implementation

via the rendezvous point, from Bob. This circuit is end-to-end encrypted between Alice and Bob, making it impossible for outstanders to see what data is transferred over the circuit. Moreover, Bob and Alice are communicating with each other, but don’t know each others real identity.

4.5. Generic overview of message interaction

To set up a hidden seeding service, tunnels from different entities have to be created in parallel with each other. Table 4.1 explains which messages are transferred for 4 setting up a connection between a seeder and downloader.

Seeder Downloader DHT IP RP EXIT establish-intro FROM TO dht-announce TO FROM intro-established TO FROM dht-request FROM TO dht-response TO FROM key-request TO FROM RELAY key-response FROM TO RELAY create-e2e TO FROM RELAY establish-rendezvous FROM TO rendezvous-established TO FROM created-e2e FROM TO RELAY link-e2e FROM TO linked-e2e TO FROM

Table 4.1: Relay cells required for setting up an end-to-end encrypted connection between a downloader and a seeder.

Figure 4.14 provides a schematic overview of the message interaction of all dispersy message cells in the protocol. 4.6. Circuit reuse 33

4

Figure 4.14: Overview of message interaction in a hidden seeding service.

4.6. Circuit reuse A hidden service needs at least 4 circuits for building up an end-to-end encrypted connection between two peers. In some cases it is possible to reuse existing circuits from another job for new purposes if the circuit is not in use. Tor uses the word cannibalization for this. Cannibalization is safe in a few instances, e.g. the dht- lookup and key-request may use the same circuit, but not for all purposes. For 34 4. Design and Implementation

instance, the circuit to an introduction point should never be used as circuit to a rendezvous-point. The hidden services protocol as implemented in Tribler depends on the existence of the following circuits to initiate an end-to-end encrypted circuit between two peers:

1. The seeder uses a circuit to establish an introduction-point.

2. The downloader uses a circuit for getting peers from the DHT (can be reused).

3. The downloader uses a circuit for getting the seeders’ keys (can be reused).

4. The downloader uses a circuit for the create-e2e message (can be reused).

5. The seeder uses a circuit to set up a rendezvous-point.

4 6. The downloader uses a circuit for the link-e2e message.

4.7. Protocol-specific extensions This section enumerates a number of decisions and insights gathered during the development of the protocol in Tribler.

opt-in for exit-nodes Each user is able to select whether to be an exit-node or not. Allowing to be an exit-node should be explicitly configured by the user, by checking a box in the anonymity-section in the configuration panel of Tribler, see figure 4.15. The value of this setting is passed as a flag in the introduction message. According to the value of this flag, other peers know how to deal with a particular candidate: using it as a relay, or either using it for exiting data to the Internet. When a circuit is extended with an extra hop, depending on the type of circuit a choice is made which candidates are eligible for extension, depending on the flag. The final goal is to eliminate exit-nodes completely, but this will only work if enough users are available in the anonymous swarms.

initiator selects the next hop in the circuit When the circuit is extended with a new hop, it returns a list of candidates where the circuit can possibly be extended to. In the past, candidate selection was performed by the last hop in the circuit. Since the implementation of hidden services, the choice is made to move the selection logic to the initiator of the circuit. This also provides the opportunity to build a circuit to peers that are not returned in the list of possible candidates, which is necessary for hidden services when it comes to linking the downloader circuit to the rendezvous-point of the seeder.

checking connectability Some circuits should end in a connectable endpoint in order to support becoming an introduction or rendezvous point. This con- nectability check takes place when the next hop is chosen. It first tries to find a connectable node, before falling back to any other node. Experiments are performed in the wild where a downloader and seeder are not connectable. As long as there are any connectable candidates around in the community, 4.7. Protocol-specific extensions 35

4

Figure 4.15: Screenshot of the panel where opt-in for exit-nodes is explicitly configured.

the downloader and seeder will find each other via the DHT and start a torrenting-session over an end-to-end encrypted circuit. Because the con- nectability check is performed by design, the connectability problem where two different circuits need to communicate with each other is solved. The ex- periments turned up unforeseen issues where the exit-socket communicates directly with the endpoint of another circuit. But a reply from the endpoint to the exit-socket sometimes used another port, causing the connectability problem to occur. This issue was revealed by performing the experiments with multiple hops on two different machines, with both machines behind a NAT thus unconnectable. By requiring the endpoint of these circuits to be con- nectable (which can be easily determined by a flag in the dispersy candidate object), the connectability problem turned out to be solved. circuits of length one The default number of hops in a circuit is set to 1. Tribler users are free to change this setting to a higher value, but the value is on purpose set to 1 to improve the user-experience and download-speeds, and in second place to reduce the load on the network by saving bandwidth and computing power, as most users do not need the maximum anonymity for all of their downloads. Tests revealed various issues with circuits of length one. As the first-hop is also the last-hop in the circuit, all requirements (like being connectable, or allowing to be an exit-node) should be met. Otherwise 36 4. Design and Implementation

the circuit will certainly fail and break the protocol. This issue is solved while experimenting with different number of hops.

circuit recovery Circuits have some security limiters enabled by default, after a fixed number of transferred bytes, on a fixed amount of time after the time of creation, and when messages are not propagated correctly anymore through the circuit, the circuit will be destroyed and recreated. This recreation pre- vents eavesdropping and improves overall privacy. But for circuits related to hidden services, security limiters can be problematic. Breaking down a circuit to an introduction point results in an unreachable seeder, and break- ing down an end-to-end circuit over a rendezvous-point will render hidden services useless when the end-to-end circuit initialization is not performed again. Especially the breakdown of an introduction-point is problematic, as 4 the DHT may still announce the introduction-point even if it is not existing anymore. Experiments with security limiters enabled and where a large file is downloaded are the perfect way for testing recovery of hidden services after circuits breaking down due to reaching security thresholds.

DHT-based introduction-peer discovery All peers that are introduction-points for seeds within the darknet, can be discovered from the DHT by their modified info-hash. But from time to time, peers might also disappear, and the DHT is required to catch up with those changes on time, because otherwise hidden services will utterly break when the DHT returns too much non-existing peers.

peer exchange via key request/response BitTorrent supports PeX for peer ex- change. But peers in BitTorrent are not the same as peers in the hidden services darknet, because the introduction points are the only entry-point for connecting to a seeder. Figure 4.16 shows how introduction points are part of the darknet, and how anonymity for both seeders and downloaders is guar- anteed. Seeders by itself do not have knowledge about other peers in the swarm, and that’s where PeX comes in. The key-request and key-response messages used for sharing keys for setting up an end-to-end circuit can be appended with information about other candidates and their keys. Periodi- cally seeders update their information by fetching peers from the DHT and performing key-requests, and by responding to other peers with a fixed set (currently maximized to 50 peers) of candidates and keys they already know about. This feature should speed up downloading in real-world examples. Although it is implemented and working in the experiments, no additional re- view and analysis had been performed on this PeX implementation, because in the experiments the DHT was already able to find enough candidates. The real value of PeX comes in large-scale experiments, but is beyond the scope of this work. 4.7. Protocol-specific extensions 37

4

Figure 4.16: Not locations of seeders, but locations of introduction points are exchanged in peer ex- change (PeX), and are passed when a downloader requests the keys from a single introduction point.

5 Experiments and performance evaluation

You have to fight for your privacy or you lose it. Eric Schmidt

This chapter will focus on performance of hidden seeding services as it is imple- mented in Tribler. Performance of a system like this can be measured in a number of ways. We investigate the overall bandwidth performance for hidden services, the effects of multiple seeders in a swarm, and the ability to recover from faults when circuits break down. We present experimental results for the hidden services ar- chitecture. Aim of the experiments is to get high-performing end-to-end encrypted circuits with a bandwidth capable of supplying high-definition video streaming to Tribler users. In practice, this means at least 5 megabit per second streaming over circuits should be reached. But for an Ultra HD quality experience, speeds of 25 megabits per seconds are required.

5.1. Environment specification and assumptions All experiments are performed with Gumby. Gumby is an experiment running framework for performing experiments in Tribler and Dispersy. The experiments in this section are performed on machines running SMP Debian 3.12 64 bits, with 6 Intel CPU dualcores on 1.6 GHz with Hyper-threading enabled, resulting into 24 cores. Each server has 8 gigabytes of memory, and 2.7 terabytes of disk space. Scenario-files are deployed where Tribler peers are created, and downloads / seed- ers of auto-generated test-files can be created instantly during the experiments. All performed experiments use the real DHT.

39 40 5. Experiments and performance evaluation

Analysis of the experiment results provided great insights during the develop- ment of hidden services in Tribler. Essential information about the (correct) working of the protocol, reliability of message interaction, and issues with the design were mostly revealed during the running of experiments. We will look into these de- sign choices and measure the performance and latency of the system for initiating a download, which consists of looking up peers from the DHT, getting the corre- sponding keys of the seeder, and initiating the end-to-end circuit. For all conducted experiments, all Tribler-instances are forced introduced to each other. This causes the experiment to introduce all available peers to each other without using the time-consuming candidate discovery in dispersy. Candidates are required for building circuits, and the time needed for discovering the right can- didates can’t be predicted. By adding the candidates manually, the experiments will bootstrap in a reasonable and constant time, and do not influence the results when running the same experiments multiple times or when comparing the results of different experiments.

5 5.2. Experiments on latency, speed and performance We start on experiments on speed and performance, by producing download traces with respectively 1, 2 and 3 hops on both the downloader and seeder side. Those traces yield graphs showing the network bandwidth, progress of downloading, and the load on the processors. There is an oversupply of candidates in the network, there are 20 candidates to choose from for building circuits, while only a few of them will actually participate in the circuits. The test starts with a 1 hop seeder experiment. The amount of hops for the circuits of the downloader are changed with values between 1 and 3 hops. The experiment thus uses 1 + 1, 2 + 1 and 3 + 1 hops. For the purpose of this test, a scenario is set up in gumby. The actual scenario file can be found in appendix A.3

• The experiment starts 2 instances of Tribler where the exit-node functionality is enabled. Also, 18 instances of Tribler with exit-node functionality disabled are started.

• Security delimiters are disabled to prevent influences from the security mech- anisms in the circuits (e.g. limiting circuits after transferring 50 megabytes)

• One instance will start seeding a 100 megabyte binary file over 1 hop

• One instance will start downloading file over 1 hop, from a 1 hop seeder. When the experiment is repeated later on: the seeder will respectively be set to 2 and 3 hops.

• After finishing the 1 hop download, another instance will start downloading the file over 2 hops

• After finishing the 2 hop download, yet another instance will start downloading the file over 3 hops 5.2. Experiments on latency, speed and performance 41

5

Figure 5.1: Progress of a download from a 1 hop seeder

Figure 5.1 depicts the progress of downloading a 100 megabyte file using various hops as a function of time, where 3 consecutive downloads are performed with respectively 1, 2 and 3 hops. The vertical bars in this diagram mark the start of each download. For the first download (downloading over 1 hop) it takes about 45 seconds before it actually starts downloading. This is the time needed to find a suitable seeding peer from the DHT, and to initiate an end-to-end encrypted circuit. It can be stated that when the implementation will be adopted by millions of real- world users, this will be the latency for initiating a torrent download. After the end-to-end encrypted circuit is built, it needs 95 seconds to complete the download of 100 megabytes. It is clear to see that the progress continues from 0 to 100% without any notable hickups. The second download experiment (downloading over 2 hops) just needs 10 seconds before it starts downloading. This is because the DHT directly returned the correct introduction point and circuits can be built instantly. The decline in latency for initiating a download is significant, and is about 10 seconds when the introduction peers are already known to the downloader. When peers via PeX are exchagned after the first DHT lookup, latency is reduced drastically because no new DHT lookups are involved anymore. The actual download takes 117 seconds, which is roughly 28% slower than the downloading over 1 hop. The third and last download experiment (downloading over 3 hops) used 134 seconds for downloading the full 100 megabytes. This is again slower, 42% than the experiment over 1 hop and 11% slower than the experiment over 2 hops. Note that the difference in percentage of progress declines when the number of hops is 42 5. Experiments and performance evaluation

5

Figure 5.2: Download speeds for downloading from a 1 hop seeder

increased. In the following sections we will see that this is caused by computational overhead for the downloader which forms a bottleneck in this experiment. Figure 5.2 reveals the download speed for the experiment. These rates are gathered per second from the libtorrent session. As libtorrent reports a smooth rate, this will also reflect into the graphs. This will be also the case in the upcoming experi- ments, but is not changing the conclusions to be made during the experiments. It is clear to see that the downloading speed gradually decreases when more hops are involved. Over 1 hops it averages on 1150 kilobytes per second, over 2 hops it av- erages on 1000 kilobytes per second, and over 3 hops it averages on 850 kilobytes per second. The worst performance, 850 kilobytes per second converts to a 6.8 megabit connection, which still suits for high-defintion video streaming. The best performing connection over 1 hop converts to a 9.2 megabit connection, which is more than enough for high-defintion streaming, but not enough for Ultra HD video (which requires a 25 megabit connection). Those downloading speeds should be reflected in the diagram of kilobytes uploaded. Figure 5.3 confirm the downloading speeds, because the upload speeds are the same. This diagram also reveals that no swarming effect is involved in the experiment, just one seeder and one down- loader exist during the experiment, and it’s running isolated, not influenced by any other peer. The use of circuits for downloading a torrent is compute-intensive, this can be seen in figure 5.4. It shows the user times for each process involved in the Tribler experiment, consisting of the downloader, seeder, and relay nodes used in the circuits. User time is the amount of CPU time that is spend in user-mode (thus 5.2. Experiments on latency, speed and performance 43

5

Figure 5.3: Upload speeds for downloading from a 1 hop seeder

outside the kernel) within the executing process. Furthermore, the downloader has the advantage of creating more end-to-end circuits to the seeder, the existence of multiple circuits is visible in the experiment using 3 hops. This will improve perfor- mance in real-world examples where the bandwidth available in a circuit is poor or fluctuates due to the amount of hops involved in the circuit, and in cases where circuits break down if an intermediate relay leaves.

The experiment is conducted on a single machine, and therefore we only see one line in the Node part of this diagram. The cumulative user-time for all pro- cesses together is depicted in this particular part of the diagram. When down- loading from a single seeder over 1 hop, it is using 100% of the processing power. This confirms the performance is limited by computational power, meaning that the measured bandwidth would be the maximum possible bandwidth for this computer. The downloader process is causing most of the load, and is fully saturating a CPU core. Libtorrent is inherently single threaded, and so is the tunnel-community. Even when there are idle CPU cores available in the system, the current code download speeds will not improve because of this single thread bottleneck. 44 5. Experiments and performance evaluation

5 Figure 5.4: User times for downloading from a 1 hop seeder System times are the times that a process spends CPU-resources in the kernel, e.g. when performing system calls on I/O. The system times for this experiment are shown in figure 5.5. They fluctuate around 0.05 and 0.10 seconds, this load is caused by the system calls issued by Tribler, for example when doing I/O operations on disk and on the network.

Figure 5.5: System times for downloading from a 1 hop seeder

We repeated the same experiment, but now with a seeder using 2 hops for its circuits. The experiment thus uses 1 + 2, 2 + 2 and 3 + 2 hops. The actual scenario file can be found in appendix A.4. The expected result of this experiment will be 5.2. Experiments on latency, speed and performance 45 a decrease of the download speed. In figure 5.6 the download speed is shown as a function of time. From this figure it can be confirmed that the download speed decreased slightly, but the impact is limited, and in the case of the 2 hop seeder + 2 hop downloader even improved a bit. The bottleneck is likely to be related to the bottleneck which the downloader and seeder are facing when encrypting data packets over the circuit.

5

Figure 5.6: Download speeds for downloading from a 2 hop seeder

The same experiment is repeated with a seeder using 3 hop circuits. The actual scenario file can be found in appendix A.5. This experiment uses 1 + 3, 2 + 3 and 3 + 3 hops. When we look at the download speeds in figure 5.7, we see a further decrease of the download speed, with the 3+3 combination performing the worst of all. The total of 6 hops produces a maximum of 650 kilobytes per second, converting to a 5.2 megabit connection. This is still enough for high-definition video streaming, but strongly discouraged for use as it puts a high load on the network (although it gives a lot of anonymity for both the downloader and seeder). 46 5. Experiments and performance evaluation

5

Figure 5.7: Download speeds for downloading from a 3 hop seeder

All combinations of hops chosen by the seeder and downloader are shown in table 5.1. As expected, the 1 hop downloader combined with the 1 hop seeder is the fastest option, whether the 3 hop downloader combined with the 3 hop seeder takes much longer.

1 hop downloader 2 hop downloader 3 hop downloader 1 hop seeder 91 seconds 110 seconds 123 seconds 2 hop seeder 97 seconds 108 seconds 125 seconds 3 hop seeder 131 seconds 127 seconds 168 seconds

Table 5.1: Time needed for finishing a 100 megabyte download over end-to-end encrypted circuits with different combinations of hop-counts.

5.3. Experiment on fault resilience In a distributed environment, peers come and go. There is no certainty about candidates whether they will be still online in the next few minutes, but as those candidates participate in circuits, the anonymity system should fully recover from faults occurring in the circuit setup. The following experiment is designed to cover fault resilience. By creating a 1 gigabyte download with the security limiters en- abled, circuits will be re-established after 50 megabytes of transferred data or after 600 seconds of activity. The expected result of the experiment is that the down- 5.3. Experiment on fault resilience 47 loading of a 1 gigabyte file The actual gumby scenario file for this experiment can be found in appendix A.1

• The experiment starts 2 exit-enabled instances of Tribler.

• The experiment starts 18 non-exit-enabled instances of Tribler.

• Security delimiters (e.g. limiting circuits after transferring 50 megabytes) are enabled

• One instance will start seeding a 1 gigabyte binary over hidden services

• Another instance will start downloading this 1 gigabyte binary from the seeder

Figure 5.8 shows the progress of downloading a 1 gigabyte file using 1 hop on both the seeder and downloader side. It takes about 45 seconds before it actually starts downloading. This latency is the same as the latency for the earlier experiment. The actual download of a 1 gigabyte file needs 1100 seconds to complete. It is clear to see that the progress continues from 0% to 100% with hick-ups. These hick- 5 ups are caused by the security limiters, after each 50 megabytes or 10 minutes a circuit is destroyed. Because downloaders are able to maintain multiple end-to-end circuits to the same seeder, the number of hick-ups is less than = 20, we see approximately 7 hick-ups, thus it was concurrently downloading over 3 end-to-end circuits and load was divided over these 3 instances.

Figure 5.8: Progress of a 1 gigabyte download over 1 hop 48 5. Experiments and performance evaluation

Because circuit limiters are fixed, and for the purpose of this experiment all circuits and candidates are created on the same time, all circuits are likely to break down on the same moment. This is not in line with how it works on the real Internet, as peers would arrive and leave on random moments, and each peer would have different seeding rates, thus the chances that circuits would all together break down like in this experiment are small. A simulation of this kind of fuzziness in the experiment is possible, but decided not to do, as the experiment now produces a worst-case scenario: it proves a resilience of the system even when all circuits break down. Figure 5.9 shows the user time of all the running processes. While the download progresses there is a high load on the system, and when the security limiters break down circuits, the load becomes low again. In the pausing moments Tribler is busy with finding new candidates for the community, doing DHT lookups to find hidden seeders, and of course building a new end-to-end encrypted circuit to download from. The DHT is able to catch up with changes, because the experiment also works after 600 seconds when the introduction point circuit breaks down. After this circuit is established it resumes the download again causing high load, this 5 load is mostly caused by the encrypting and decrypting of packets to be transferred through the circuit of hops.

Figure 5.9: User times for downloading 1 gigabyte over 1 hop

Figure 5.10 clearly depicts the bandwidth usage (in kilobytes uploaded) for each process that’s part of the experiment. This bandwidth is measured as the raw band- width by dispersy. For transferring a 1 gigabyte file, the actual network bandwidth for the seeder is 1.2 gigabytes. This extra overhead on the data is caused by the encryption and overhead of payload in the packets sent through the dispersy overlay. 5.3. Experiment on fault resilience 49

5

Figure 5.10: Upload bandwidth usage on a 1 gigabyte upload

The download speed during the experiment is shown in figure 5.11. The max- imum speed achieved is around 1250 kilobytes per second on this machine. A machine with more computing power per CPU core would yield better results.

Figure 5.11: Download speeds for downloading 1 gigabyte over 1 hop 50 5. Experiments and performance evaluation

5.4. Experiment on multi-swarm collaboration In BitTorrent, swarms need to collaborate when not all peers own all chunks of a torrent. Peers can only complete download a torrent when all chunks are available in the current swarm. The following experiment is designed to show how load is distributed evenly when multiple seeders in a swarm are available, and a single downloader should find those available seeders from the DHT, and start download- ing from them over end-to-end encrypted circuits. For the purpose of this test, the following scenario is set up in gumby. The actual scenario file can be found in appendix A.2.

• The experiment starts 2 exit-enabled instances of Tribler.

• The experiment starts 18 non-exit-enabled instances of Tribler.

• Security delimiters (e.g. limiting circuits after transferring 50 megabytes) are 5 enabled • One instance will start seeding the 100 megabyte binary file ‘A’ over hidden services

• Two instances will start seeding the 100 megabyte binary file ‘B’ over hidden services

• Three instances will start seeding the 100 megabyte binary file ’C’ over hidden services

• One instance will start downloading file ‘A’ from 1 seeder

• After downloading of ‘A’ is finished, another instance will start downloading file ‘B’ from 2 seeders

• After downloading of ‘B’ is finished, yet another instance will start downloading file ‘C’ from 3 seeders

The experiment utilizes 3 different swarms, to ensure it starts with a clean measure- ment after each download, as DHT announcements take place after each download and those would influence the upcoming experiment with seeders that would not exist anymore. The download bandwidth graph in figure 5.12 shows the speed for downloading from 1 seeder is approximately 1200 kilobytes per second, while downloading bandwidth from 2 and 3 seeders is slightly slower, 1150 kilobytes per second. The reason for this might be caused by the fact that downloading from multiple seeders means that there is more often context switching for encrypting and decrypting messages over different tunnels to different seeders. 5.4. Experiment on multi-swarm collaboration 51

5

Figure 5.12: Download speeds for downloading from multiple seeders

Figure 5.13: Upload speeds for downloading from multiple seeders 52 5. Experiments and performance evaluation

The uploading bandwidth as shown in figure 5.13 nicely depicts that all available seeders are involved equally for downloading the torrent. Where the first (pink) curve shows the single seeder uploading the file, the green and yellow curves show how the bandwidth for uploading is divided by the two available seeders. The downloader maintains the same downloading speed, but divides the load over all available seeders. In the downloading from 3 seeders it’s hard to see that there are three curves, a light-blue and dark-blue curve almost above each other, and a purple curve that’s a bit lower. But all three curves together again reach the same (maximum) download speed that the downloader is able to manage with its available resources.

5

Figure 5.14: User times for downloading from multiple seeders

The user time needed by the processes is shown in figure 5.14. The process that reaches 1.00 seconds of user time is the downloading process. For the case where one seeder is available, the loads for the processes of the seeder and relaying hops is between 0.75 and 0.90, while the loads for these processes gradually decrease when there are respectively 2 and 3 seeders. This means that the amount of seeders in the swarm is scalable when enough relaying candidates are around. The results of this experiment clearly show that a bottleneck occurs at the downloader. For the experiments with 1, 2 and 3 seeders it utilizes all the available CPU resources, without achieving higher throughput rates. It currently would not make any sense to do experiments with even more seeders because the resources of the downloader are already saturated.

5.5. Concluding remarks on experiments The conducted experiments provide useful insights in the current bottlenecks of the tunnel-community in Tribler. Measuring the time needed for setting up a hidden service involves the creation 5.6. Yappi profiling of the system 53 of all the circuits, negotiating messages, until an end-to-end connection is available between two peers. The creation of a hidden service is far more complex than just setting up a single circuit for anonymous downloading over a few hops. Also varying the number of hops in a circuit impacts performance, because using more hops consumes more time and bandwidth, and using too much hops in circuits could badly influence performance, and is expensive in terms of overall network bandwidth and CPU power, thus will cause a high load in the network. The limiting factors on performance have been extensively researched, and it turns out to be mainly caused by the cryptography applied on packets. The in- creasing number of hops in a circuit is further negatively impacting performance, because the seeder and downloader both communicate over a circuit where they are responsible for successive library calls for RSA encrypting / decrypting packets, analogous with the onion routing algorithm for each hop that’s part of the circuit. The maximum bandwidth reached in the experiments was 1200 kilobytes per sec- onds, converting to a 9.5 megabit connection. This suits perfectly for HD video streaming, but unfortunately not for Ultra HD video streaming. 5 5.6. Yappi profiling of the system A deeper analysis on the bottlenecks in the system is performed by Quinten Stokkink and Harmjan Treep (see [29]). They used Yappi, a profiler for the Python pro- gramming langauge in which the Tribler code is written. Yappi is able to analyze per-thread and gather stats on CPU-time per function-call. This analysis revealed the compute-intensive parts of the tunnel-community. Some time-consuming calls can be easily improved by applying caching for storing information (like converting ip-adresses from binary to string format which is done often). But the most time- consuming functions can be categorized into two groups: cryptography-related and networking-related functions. Both sets of functions take responsibility for more than 50% of the CPU time in the tunnel-community, and parallelization of these functions would be beneficious for the performance. But it turns out that paral- lelization is not a solution for everything, as the successive encrypting of packets is the most time-consuming part of the system and can’t be parallelized in an easy way as most of the compute intensive functionality is in external libraries (e.g. the python cryptography package, and the libtorrent library).

6 Future work

The current implementation of hidden services in Tribler provides a basic level of anonymity, but there are still points for improvement to be addressed before one can rely on the anonymity without any doubts. This requires extension of the hidden services protocol with mechanisms that prevent freeriding, sybil attacks and reputation management. Some of these techniques are already implemented in Tribler but not integrated with hidden services as this requires well-thought-out solutions, out of the scope of this work. The following sections will point out various issues to be resolved, resulting in Tribler to become a mature anonymity system.

Long-term tit-for-tat for anonymity Each Tribler user gains anonymity because other Tribler users are volunteering bandwidth for relaying data through the network of tunnels. To disseminate how much data a user volunteered for relaying data of others, an open community that records this information for each user on a long term is required, for example by combining the Bartercast reputation mechanism with a Bitcoin-like blockchain for validation. This makes it possible for peers to determine the reliability of other peers without revealing the identity of the peers. As a result of this functionality it will be also possible to fight against freeriders who consume bandwidth in the network without contributing bandwidth back into the community. Those freeriders can be punished by disabling the anonymous downloading functionality, just to motivate users to share their bandwidth for helping to relay data of others.

Secure the key-requesting mechanism In the current approach, the introduction point knows too much and adversary peers acting as introduction point can reveal the identity of users. The introduction point announces an info hash to the DHT using its dispersy-port. Hence, when a downloader finds this peer, it discovers the dispersy port. An alternative solution would be to use the port of the exit-socket of the tunnel towards the seeder, because

55 56 6. Future work

this would allow a downloader to directly communicate the key-request/create-e2e towards the seeder without having the introduction point to know the info hash in order to choose the correct circuit for relaying the message. This requires the hop leading to the exit-socket to be connectable, as the connectability problem might cause issues here. Another obvious improvement would be the encryption of the key-request related messages. A possible solution for this is to encrypt the key- request and key-response messages with a hash of the info hash that’s known by both the downloader and seeder, but not known to the introduction point (when using the exit-socket solution mentioned above).

Circuit status reporting The circuit status reporting was built in 2014, and only shows circuits created by the own process. It does not reveal information about the circuits it is participating as a relay or exit socket, or the type of circuit it is participating in. Furthermore, circuits related to hidden services may be displayed with the wrong length (as the circuit length in the darknet is unknown). A new design for displaying the status of the actual circuits is required, as the current user interface does not suit the possibilities of hidden services anymore.

6 7 Conclusion

Privacy on the Internet is an illusion. The goal of this work is to enhance the use of BitTorrent with security, privacy and anonymity. In the past decades, various ideas have been proposed for anonymity systems. The next step in anonymous communication is a darknet where senders and receivers share resources without the need to know each other and nobody knows who’s talking to who. A fully decentralized and anonymous peer-to-peer system for sharing files over Bit-torrent, inspired by the Hidden Services of the Tor project is implemented in Tribler. The current implementation of hidden services in Tribler is just a proof of con- cept, but already superior to Tor in terms of speed and censorship-resistance. opening opportunities for streaming high-definition video from BitTorrent in a fully- decentralized way. There are weak spots in the system to be solved, like adding protection against traffic analysis or Sybil attacks. To prevent a false sense of security, users should still be warned until improvements for the weak spots are incorporated and the anonymity system has proven itself mature. The conducted experiments reveal that the current implementation is limited by the availability of CPU resources, and in less extent to the available network bandwidth. This is due to the strong cryptography applied on data. Future research should be conducted on improving security and performance. There is no central authority in Tribler. However, freedom is not absolute. The current implementation in Tribler gives opportunities for both legal and illegal ac- tivity. Seeders and downloaders can both act anonymous in the network. Tribler is a social platform for transporting data anonymously through the network, and like any other means of communication, transporters are not responsible for the content transmitted, especially when the data is encrypted and not visible for inspection. All responsibility for the available content lies with the users in the community. Objec- tionable content can become essentially impossible to find in the network by voting down the channels that are already available in Tribler. This renders it useless for anyone that would use Tribler for criminal and cruel activities.

57

Appendices

59

A Gumby experiments

A.1. hiddenservices-1-gigabyte-1-hop.scenario

# This experiment starts seeding a 1024MB file. Another processes will # download from this seeder using respectively 1 hops. All security limiters # will be active, so every now and then circuits will break down. # @0:0 set_master_member 3081a7301006072a8648ce3d020106052b81040027038192 ... @0:2 start_session @0:3 set_test_file_size 1073741824 @0:3 set_security_limiters True @0:5 init_community exit crypto {1-2} @0:6 init_community no_exit crypto {3-20} @0:10 online @0:11 introduce_candidates @0:20 reset_dispersy_statistics @0:20 setup_seeder 1hopsgigabyte 1 {4} @0:100 start_download 1hopsgigabyte 1 {5} @0:1500 stop

A.2. hiddenservices-1-hop-multiple-seeders.scenario

# This experiment creates files seeded by respectively 1, 2 and 3 seeders. # The download processes will download from these seeder using 1 hops. # @0:0 set_master_member 3081a7301006072a8648ce3d020106052b81040027038192 ... @0:2 start_session @0:5 init_community exit crypto {1-2} @0:6 init_community no_exit crypto {3-20} @0:10 online @0:11 introduce_candidates @0:20 reset_dispersy_statistics @0:20 setup_seeder a1hops100mb 1 {4} @0:30 setup_seeder b1hops100mb 1 {5} @0:40 setup_seeder b1hops100mb 1 {6} @0:50 setup_seeder c1hops100mb 1 {7} @0:60 setup_seeder c1hops100mb 1 {8} @0:70 setup_seeder c1hops100mb 1 {9} @0:100 start_download a1hops100mb 1 {10} @0:250 start_download b1hops100mb 1 {11}

61 62 A. Gumby experiments

@0:400 start_download c1hops100mb 1 {12} @0:600 stop

A.3. hiddenservices-1-hop-seeder.scenario

# This experiment runs a 1 hop seeder seeding a 100mb file. Other processes # will download from this seeder using respectively 1, 2 and 3 hops. # @0:0 set_master_member 3081a7301006072a8648ce3d020106052b81040027038192 ... @0:2 start_session @0:5 init_community exit crypto {1-2} @0:6 init_community no_exit crypto {3-20} @0:10 online @0:11 introduce_candidates @0:20 reset_dispersy_statistics @0:20 setup_seeder 1hops100mb 1 {4} @0:100 start_download 1hops100mb 1{5} @0:250 start_download 1hops100mb 2 {6} @0:400 start_download 1hops100mb 3 {7} @0:600 stop

A.4. hiddenservices-2-hop-seeder.scenario

# This experiment runs a 2 hop seeder seeding a 100mb file. Other processes will # download from this seeder using respectively 1, 2 and 3 hops. # @0:0 set_master_member 3081a7301006072a8648ce3d020106052b81040027038192 ... @0:2 start_session @0:5 init_community exit crypto {1-2} @0:6 init_community no_exit crypto {3-20} @0:10 online @0:11 introduce_candidates @0:20 reset_dispersy_statistics @0:20 setup_seeder 2hops100mb 2 {4} @0:100 start_download 2hops100mb 1 {5} @0:250 start_download 2hops100mb 2 {6} @0:400 start_download 2hops100mb 3 {7} @0:600 stop

A.5. hiddenservices-3-hop-seeder.scenario

# This experiment runs a 3 hop seeder seeding a 100mb file. Other processes will # download from this seeder using respectively 1, 2 and 3 hops. # @0:0 set_master_member 3081a7301006072a8648ce3d020106052b81040027038192 ... @0:2 start_session @0:5 init_community exit crypto {1-2} @0:6 init_community no_exit crypto {3-20} @0:10 online @0:11 introduce_candidates @0:20 reset_dispersy_statistics @0:20 setup_seeder 3hops100mb 3 {4} @0:100 start_download 3hops100mb 1 {5} @0:300 start_download 3hops100mb 2 {6} @0:500 start_download 3hops100mb 3 {7} @0:800 stop Glossary

Adversary An adversary is an entity within a distributed network trying to break the system by forging messages or dropping messages..

BarterCast BarterCast is a reputation system integrated into Tribler, currently used for exchanging the amount of uploaded and downloaded bytes for users in the network.. BitTorrent A network protocol that is designed for distributing large files over a network of peers.

Channel A channel in Tribler is used for discovering swarms. It is used to prevent fake content and reduce spam.

DHT A decentralized and distributed hash table. In BitTorrent, a DHT is used for looking up other nodes in a swarm for a given info hash. Dispersy Distributed Permission System, an elastic database system for use in communities in a decentralized network.

H(m) A cryptographic hash of 푚.

LEDBAT LEDBAT stands for ‘Low Extra Delay Background Transport’, a congestion control algorithm for transferring data on the Internet based on limiting delays while using full bandwidth..

Peer A peer is a client in the BitTorrent swarm, to which other peers can connect.

Receiver anonymity A system providing receiver anonymity ensures that it is im- possible for a sender to ascertain the identity of the receiver of the message..

Sender anonymity A system providing sender anonymity ensures that it is im- possible for a receiver to ascertain the identity of the sender of the message.. Swarm A swarm consists of all simultaneous online clients downloading or upload- ing (parts of) the same files.

Tit-for-tat Tit-for-tat is the rewarding mechanism for donating bandwidth in a distributed network. Download speed is optimized for peers cooperating more than others..

63 64 Glossary

Tor The Onion Router, free software for protecting anonymity on the Internet by sending packets over circuits of hops in a distributed network of relays..

Tracker A tracker is a server in a peer-to-peer network assisting peers to connect with each other..

Tribler A fully decentralized peer-to-peer file-sharing client for the BitTorrent net- work, built as a scientific research project. Bibliography

[1] Libtorrent blog. Smart-ban. Nov. 2011. URL: http://blog.libtorrent. org/2011/11/smart-ban/. [2] David Chaum. “The dining cryptographers problem: Unconditional sender and recipient untraceability”. In: Journal of cryptology 1.1 (1988), pp. 65–75. [3] David L. Chaum. “Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms”. In: Commun. ACM 24.2 (Feb. 1981), pp. 84–90. ISSN: 0001- 0782. DOI: 10.1145/358549.358563. URL: http://doi.acm.org/ 10.1145/358549.358563. [4] Ian Clarke et al. “Freenet: A distributed anonymous information storage and retrieval system”. In: Designing Privacy Enhancing Technologies. Springer. 2001, pp. 46–66. [5] Bram Cohen. “Incentives build robustness in BitTorrent”. In: Workshop on Economics of Peer-to-Peer systems. Vol. 6. 2003, pp. 68–72. [6] Eve Conant. Is the Cold War Back? - National Geographic News. Sept. 2014. URL: http : / / news . nationalgeographic . com / news / 2014 / 09 / 140912-cold-war-geography-russia-ukraine-sanctions/. [7] Lance Cottrell. Mixmaster and remailer attacks. 1994. [8] Wei Dai. “PipeNet 1.1”. In: post, August (1996). [9] George Danezis, Roger Dingledine, and Nick Mathewson. “Mixminion: Design of a type III anonymous remailer protocol”. In: Security and Privacy, 2003. Proceedings. 2003 Symposium on. IEEE. 2003, pp. 2–15. [10] Roger Dingledine. Tor Protocol Specification. URL: https : / / gitweb . torproject.org/torspec.git/blob_plain/HEAD:/tor- spec. txt. [11] Roger Dingledine. Tor Rendezvous Specification. URL: https://gitweb. torproject.org/torspec.git?a=blob_plain;hb=HEAD;f=rend- spec.txt. [12] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The second- generation onion router. Tech. rep. DTIC Document, 2004. [13] Shlomi Dolev and Rafail Ostrobsky. “Xor-trees for efficient anonymous mul- ticast and reception”. In: ACM Transactions on Information and System Se- curity (TISSEC) 3.2 (2000), pp. 63–84. [14] John R Douceur. “The sybil attack”. In: Peer-to-peer Systems. Springer, 2002, pp. 251–260.

65 66 Bibliography

[15] Michael J Freedman and Robert Morris. “Tarzan: A peer-to-peer anonymizing network layer”. In: Proceedings of the 9th ACM conference on Computer and communications security. ACM. 2002, pp. 193–206. [16] David Goldschlag, Michael Reed, and Paul Syverson. “Onion routing”. In: Communications of the ACM 42.2 (1999), pp. 39–41. [17] Christian Grothoff et al. An encoding for censorship-resistant sharing. Tech. rep. Technical report http://www. cs. helsinki. fi/u/jtlindgr/stuff/ecrs. ps. SPI- DER ICT4D Series| Increasing transparency and fighting corruption through ICT, 2003. [18] Ceki Gulcu and Gene Tsudik. “Mixing E-mail with Babel”. In: Network and Distributed System Security, 1996., Proceedings of the Symposium on. IEEE. 1996, pp. 2–16. [19] Gertjan P. Halkes and Johan A. Pouwelse. “UDP NAT and Firewall Punctur- ing in the Wild.” In: Networking (2). Ed. by Jordi Domingo-Pascual et al. Vol. 6641. Lecture Notes in Computer Science. Springer, 2011, pp. 1–12. ISBN: 978-3-642-20797-6. URL: http : / / dblp . uni - trier . de / db / conf/networking/networking2011-2.html#HalkesP11. [20] Gwyn Topham Ian Traynor Sam Jones. Prism NSA surveillance ’did not collect European data in bulk’. June 2013. URL: http://www.theguardian. com/world/2013/jun/14/prism-nsa-surveillance-european- data. [21] Paul C. Kocher. “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”. In: Proceedings of the 16th Annual International Cryptology Conference on Advances in Cryptology. CRYPTO ’96. London, UK, UK: Springer-Verlag, 1996, pp. 104–113. ISBN: 3-540-61512-1. URL: http: //dl.acm.org/citation.cfm?id=646761.706156. [22] Olaf Landsiedel et al. “Dynamic multipath onion routing in anonymous peer- to-peer overlay networks”. In: Global Telecommunications Conference, 2007. GLOBECOM’07. IEEE. IEEE. 2007, pp. 64–69. [23] Alan Mislove et al. “AP3: Cooperative, decentralized anonymous communi- cation”. In: Proceedings of the 11th workshop on ACM SIGOPS European workshop. ACM. 2004, p. 30. [24] Prateek Mittal et al. “PIR-Tor: Scalable Anonymous Communication Using Pri- vate Information Retrieval.” In: USENIX Security Symposium. 2011. [25] Arjun Nambiar and Matthew Wright. “Salsa: a structured approach to large- scale anonymity”. In: Proceedings of the 13th ACM conference on Computer and communications security. ACM. 2006, pp. 17–26. [26] Tribler Organization. About Tribler. Oct. 2014. URL: http://www.tribler. org/about.html. [27] Tribler Organization. Anonymous Downloading and Streaming specifications. June 2014. URL: https://github.com/Tribler/tribler/wiki/ Anonymous-Downloading-and-Streaming-specifications. Bibliography 67

[28] R.S. Plak. “Anonymous Internet, Anonymizing peer-to-peer traffic using ap- plied cryptography”. MA thesis. the Netherlands: Delft University of Technol- ogy, 2014. [29] Johan Pouwelse Quinten Stokkink Harmjan Treep. Performance analysis of a Tor-like onion routing implementation. 2015. URL: http://arxiv.org/ /1507.00245v1.pdf. [30] Marc Rennhard and Bernhard Plattner. “Introducing MorphMix: peer-to-peer based anonymous Internet usage with collusion detection”. In: Proceedings of the 2002 ACM workshop on Privacy in the Electronic Society. ACM. 2002, pp. 91–102. [31] Rob Sherwood, Bobby Bhattacharjee, and Aravind Srinivasan. “P 5: A protocol for scalable anonymous communication”. In: Journal of Computer Security 13.6 (2005), pp. 839–876. [32] Daniel Stutzbach and Reza Rejaie. “Understanding Churn in Peer-to-peer Net- works”. In: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement. IMC ’06. Rio de Janeriro, Brazil: ACM, 2006, pp. 189–202. ISBN: 1-59593-561-4. DOI: 10.1145/1177080.1177105. URL: http: //doi.acm.org/10.1145/1177080.1177105. [33] R. Tanaskoski. “Anonymous HD Video Streaming”. MA thesis. the Nether- lands: Delft University of Technology, 2014. [34] Juan Pablo Timpanaro, Isabelle Chrisment, and Olivier Festor. “A Bird’s Eye View on the I2P Anonymous File-sharing Environment”. English. In: Proceed- ings of the 6th International Conference on Network and System Security. Wu Yi Shan, China, Nov. 2012. URL: http : / / hal . inria . fr / hal - 00744919. [35] Shaun Walker. Russia to monitor ’all communications’ at Winter Olympics in Sochi - The Guardian. Oct. 2013. URL: http://www.theguardian.com/ world/2013/oct/06/russia-monitor-communications-sochi- winter-olympics. 68 Bibliography