The fifteen year struggle of decentralizing privacy-enhancing technology Rolf Jagerman, Wendo Sabee,´ Laurens Versluis, Martijn de Vos, Johan Pouwelse (course supervisor)

!

Abstract—Ever since the introduction of the internet, it has anonymous internet system, due to the initial been void of any privacy. The majority of internet traffic currently lack of users. To support a large number of is and always has been unencrypted. A number of anonymous users, such a network has to be decentralized. communication overlay networks exist whose aim it is to provide A lack of would otherwise re- privacy to its users. However, due to the nature of the internet, there is major difficulty in getting these networks to become sult in bottlenecks that place constraints on the both decentralized and anonymous. We list reasons for having number of users. Relying on server bandwidth anonymous networks, discern the problems in achieving decen- donations has proven to be a difficult to sustain tralization and sum up the biggest initiatives in the field and their strategy. current status. To do so, we use one exemplary network, the The most widely used anonymous commu- network. We explain how Tor works, what vulnerabilities this nication system is Tor. In this technical report network currently has, and possible attacks that could be used to violate privacy and anonymity. The Tor network is used as a we will analyse Tor and its semi-centralized key comparison network in the main part of the report: a tabular nature. Tor struggles to keep up with the band- overview of the major anonymous networking technologies in width demands of its users. As the number of use today. users increases, the need to decentralize Tor be- comes more urgent. Decentralizing Tor isn’t an easy task: After fifteen years of decentralization 1 INTRODUCTION attempts, the network is still partially central- All feelings of privacy concerning browsing the ized. Only few decentralized alternatives to Tor internet, talking on the telephone or location exist, however they lack the user base to be tracking of cellphones are an illusion. In recent considered safe and useful. Examples include years the need for privacy-enhancing technol- [32], [19] and Tapestry.

arXiv:1404.4818v1 [cs.CY] 18 Apr 2014 ogy has become more apparent. Revelations by This technical report is structured as follow- Edward Snowden of government misconduct ing: In section 2 we will give an introduction and constitutional violations have sent shock and overview of Tor. Known vulnerabilities in waves through the internet. Tor are discussed in section 3. After that, we We failed to make an internet that is secure will talk about decentralization and its prob- and private. Although much research has been lems in section 4. The current state of decen- done on anonymous internet communication, tralized internet systems is discussed in sec- only few systems have been actually imple- tion 5. A comparison of existing decentralized mented and only one is actively being used. networks is made in section 6. Finally, we will One of the most important factors that impact conclude and discuss our findings in section 7. anonymity in a communication system is the number of users. A sufficiently large number of users are required for a system to make guaran- 2 INTRODUCTIONTO TOR tees about its ability to protect the privacy of its The implementation of The Onion Router users. This makes it difficult to introduce a new (TOR) was first described in 1996 by the U.S. Navy Research Laboratory, as a means to pro- with its private key, and uses that secret to tect government communications from digital, decrypt the rest of the packet. The result is a as well as physical attacks, by hiding the loca- header with the address for node Y and the tion of the communicating party or parties [15]. payload encrypted with secrets for the follow- The idea behind traces back to ing nodes in the circuit, which is forwarded to 1981, where Chaum described it in his famous node Y . paper “Untraceable electronic mail, return ad- As a node receives a packet from the previ- dresses, and digital pseudonyms”[9]. ous node, it peels off another layer of encryp- In 2002, discontinued their tion, much as how you can peel an onion layer old code base and re-implement the project for layer, and forwards it to the next node in the as Tor, the Second Generation Onion Router. circuit (as specified in the decrypted header). They introduced perfect forward secrecy, direc- When exit node Z decrypts the last layer, it tory servers, hidden services and more [10]. In forwards the payload outside the network to this section, we will explain the various com- the original destination that our trusted ponents of Tor, the structure of the network, tried to contact, acting as a traditional proxy. circuit creation and disadvantages of Tor. When exit node Z receives a response, this whole process is applied in reverse order, en- crypting the payload with its secret along the 2.1 Onion routing way, instead of decrypting. When our client C As described in the original design paper of receives the packet, it peels off all the encryp- The Onion Router [15], network traffic is for- tion layers to retrieve the unencrypted payload. warded through a circuit of nodes, where each With the second generation onion routing node only knows the previous and next node used in Tor, a modified algorithm is used to in the circuit. With a sufficiently long circuit derive the encryption keys, called telescoping of (independent) nodes, this means that two path-building, which also provides perfect for- communicating parties can remain oblivious of ward secrecy. This algorithm is described in each others physical location. section 2.4. Say we have a circuit consisting of four nodes: our trusted client node (C), an (entry or guard) relay node (X), a (middle) relay node 2.2 Directory servers (Y ) and an (exit) relay node (Z). In this case, The original Onion Router used an unsafe, de- there are three relay nodes, but this is not centralized node discovery mechanism called necessarily always the case as more middle in-band status updates. During such a status relay nodes can be added. A visualization of update each node broadcasts known nodes to this path can be seen in figure 1. Each node its neighbours. An attacker could exploit this has its own public key and a corresponding to isolate and limit the knowledge of a client, private key. When building the circuit, our forcing connections through malicious nodes. client generates a distinct secret for each of Another disadvantage is that in-band status these nodes. More information about the circuit updates take longer to propagate throughout setup, can be found in 2.4. the network and create a global consensus. The payload of each packet flowing through To mitigate these concerns, directory servers the circuit is first encrypted with the distinct se- were introduced to Tor during its reimple- cret for last node Z, then with the distinct secret mentation. These directory servers keep a re- for node Y , and last with the secret for node dundant central consensus about the network. X. With each layer of encryption, a header is They act as HTTP servers to which Tor nodes added with the address of the next node in the can publish signed information about them- circuit, plus the used distinct secret encrypted selves. Tor clients can in turn download this with corresponding nodes public key. information, as seen in figure 1. After node X receives this packet from our The information distributed by the directory trusted node C, it decrypts the attached secret servers is signed. The keys to verify these Directory Server

Guard Node

Internet Exit User Node

Relay Node

Fig. 1: The components of the Tor network. After downloading the node list from the Directory Server, the user creates a circuit through a guard node, a relay node and an exit node. This circuit is used to communicate (anonymously) with the internet.

signatures are preloaded in the Tor software, • The entrance Tor router: this router is di- along with the list of directory servers. This rectly connected to an onion proxy and can implies trust by the Tor client in the directory observe the origin of a request through the servers. Tor network. The entrance router sends the packet to the middle Tor router. • The middle Tor router: this router is con- 2.3 Relay and exit nodes nected to the entrance router and the exit The Tor network consists of several compo- router. nents. The clients in the Tor network are known • The exit Tor router: this router is connected as onion proxies. The software to run an onion to the web server. Note that the exit Tor proxy is available for free on the Tor website router is the only router that can observe [37] and is easy for users to configure. The the final destination of the request. onion proxies are responsible for downloading The first router in a circuit is the entrance the directory information, establishing circuits router. The entrance router sends the data to across the network and handling connections one of the middle routers which forwards the from user applications. data to the exit router. The routing in the network is done by onion routers, also called relay nodes. The relay nodes relay the data from the onion proxy to 2.4 Circuit creation the web server across a circuit (circuits are Data on the Tor network travels over several re- described in 2.4). Each onion router is con- lay nodes before it reaches its destination. Such nected to every other onion router with a TLS a selection of nodes is called a circuit. To ensure connection [17]. Each circuit has three type of both good performance and anonymity, a path onion routers [20]: is chosen using a sophisticated path selection algorithm. This algorithm selects nodes based • Some Tor users put more traffic on the net- on the bandwidth of the nodes [35]. Nodes that work than they contribute by running an have more bandwidth, have a higher probabil- onion router. This means that these users ity to be chosen for the circuit creation. The are slowing the network down as they use same node cannot be used more than once in more traffic than giving back. A possible a single circuit. solution for this is to throttle certain high- Suppose Alice is an onion proxy that wants bandwidth protocols such as at to connect through the Tor network to a web exit nodes or at onion proxies. server. Circuit creation uses the Diffie-Hellman • The Tor network doesn’t have the capacity key-exchange protocol [16] to establish a shared to handle all the users that want privacy on secret between nodes. To create a new circuit, the internet. According to the Tor Metrics Alice first sends a create cell with the first half of project [36], it takes about 6 seconds to the Diffie-Hellman handshake to the first node download 1 MiB of data. in her selected path (for example, OR1). OR1 Due to the fact that traffic travels over sends a created cell back with the second half of multiple Tor nodes, the total amount of the key along with a hash of the final key. Now transferred data in the network multiplies. both Alice and OR1 have a shared key they use This is illustrated in figure 2. Normally, to encrypt and decrypt data sent between them. when 1 GiB is transferred over the internet, Alice now has a connection with the first it has a network cost of 2 GiB1. By having onion router in the circuit. To extend the circuit n hops, the amount of transferred traffic to OR2, Alice first sends a relay extend cell to would be multiplied by 2(n + 1). As Tor OR1. This cell contains the address of the next uses 3 hops by default, this means that a onion router in the circuit and the first half of 1 GiB transfer would result in a network the key to use in the communication between cost of 8 GiB. By increasing the amount of her and OR2. OR1 takes this first half of the onion routing nodes in the Tor network, key and sends a create cell with this key to OR2. the capacity is increased. Incentives such When OR1 receives a created cell, OR1 passes as LIRA [18] could make more users run an this cell to Alice. Now Alice and OR2 a onion router, thus increasing the capacity common key. The same procedure can be used of the network and making it faster. to extend the circuit with more nodes. • The current path selection algorithm of Tor doesn’t distribute the load evenly over the network. The problem is that the current 2.5 Disadvantages selection strategy is optimal when the net- While Tor offers its users a high level of work is fully loaded. This is not always anonymity, there are some disadvantages using the case. Using a better path selection al- it. According to Dingledine et al [11], there gorithm could increase the capacity of the are six reasons why Tor is not optimal. In this network and the overall user experience. section, we will summarize these reasons and • Tor clients are not optimal at handling explain what could be done to fix them. latency and connection failures. For ex- • Tor’s congestion control does not work ample, if extending a circuit fails, the en- well. The network has some problems han- tire circuit is abandoned. An improvement dling bulk transfers, such as download- would be to first try to extend the circuit to ing large files or streaming high-quality some other places. If that fails, the circuit videos. The congestion control could be could be abandoned. Also, a better timeout improved by using an unreliable protocol mechanism could be chosen for building such as UDP for links between Tor relays. circuits. Goldberg et al. [2] have proposed PCTCP • Much of the overhead of the network is which could improve the response time of the Tor network by 60% and the download 1. 1 GiB upload by the sender and 1 GiB download by the time of files by 30%. receiver, or 2 GiB in total. 1 2 n+1

. . .

Source Hop 1 Hop 2 Hop n Target

Fig. 2: As traffic moves over Tor nodes, the total amount of bandwidth used in the network increases. By using n hops, the total amount of network traffic would be multiplied by 2(n + 1).

in downloading the directory information. Unless hidden services are used, content There is also overhead in the TLS connec- travels unencrypted through a Tor exit node. tion between the nodes in the network. Users running a Tor exit node could be held According to Dingledine et al, removing responsible for distributing illegal content. The the empty TLS application record could possibility of being seen as the originator of reduce the overhead in the TCP/IP header illegal content refrains users from running an by 6.3%. exit node. A possible solution could be to filter The directory service generates overhead on the illegal content from the legal content. While the network. Replacing this central authority content filtering could be a possible solution, it with a decentralized component, could reduce is an open question whether filtering is in line the overhead of the network and improve per- with the principles of Tor and the internet. formance. In conclusion, this means that we would like to see Tor decentralized. Although 3 TOR VULNERABILITIES AND ATTACKS much research has been done on the decentral- Besides the disadvantages mentioned in the ization of Tor, it still uses centralized compo- previous section, Tor also suffers from several nents today. vulnerabilities that can be exploited through 2.6 Tor stinks? attacks [1], [13], [6]. In this section we will summarize some of the most well known prob- Tor is not only used by human rights activists. lems with Tor as well as define the following It is also used by distributors of illegal content categories of attacks: browser based attacks, and providers of illegal services, because it is low-resource routing attacks, Sybil attacks and deemed untraceable. This serious problem is replay attacks. disrupting Tor’s public image. A recent ex- ample is the shutdown of , which is an online market for trading prohibited sub- 3.1 Browser based attacks stances and other illegal goods [26]. The market Traffic analysis can be used to attack the operated as a hidden Tor service. These services anonymity of a user (a Tor client) browsing are accessed through an onion address and not the web using Tor [1]. By misusing the exit an IP-address, hiding the physical location of policy of Tor one can reduce the time required the service. This makes it very hard for agen- to perform the analysis from O(nk) to O(n + k) cies to track and shut down these operations. where n is the number of exit nodes and k is A recent research on the content and popu- the number of entry guards. larity of Tor’s hidden services [7] has shown By running an HTTP exit node and a Tor that although there are Tor hidden services router that eventually will act as an entry node that distribute illegal content, many hidden in the network, an adversary can discover the services are resources devoted to human rights, identity of a user. The exit node injects an invis- freedom of speech and information which is ible iframe containing some JavaScript into any prohibited in some countries. It is not clear web page that passes through it, each sending which type of service is more popular on Tor. a unique ID to a malicious web server. Every ten minutes the Tor client chooses a new circuit of service attack on well used entry nodes, and eventually an unlucky Tor client picks and forcing Tor clients to choose a new one. This uses the malicious entry node that was placed improves the chance that a Tor client chooses in the network. a malicious entry node. By performing traffic analysis to compare Bauer et al. proposed several solutions in the unique IDs of the web server and the their paper. The first one is to actually verify circuits passing trough the entry node, a user the resources of the nodes by, for example, can be identified. Disabling JavaScript does not measuring the bandwidth and/or uptime of a mitigate this, because a similar attack can be node. Bandwidth can be checked centralized or set up only using the HTML meta refresh tag. decentralized: the disadvantage of a centralized To increase the odds of a user choosing the bandwidth check is that it generates much malicious exit node, one can run the exit node overhead on the network. With distributed on unpopular ports. There are usually only a bandwidth verification, Tor routers monitor few exit nodes running on file ports, each other but this is not enough to detect such as 4661 to 4666. Since Tor prefers older selective malicious nodes. Another solution is circuits, using a denial of service attack against to restrict the amount of routers that can be on the older exit nodes forces Tor into creating a a single IP-address. The last solution proposed circuit with the malicious exit node. is to change the routing strategy. The solution for the JavaScript injection at- tack is disabling active content systems in the 3.3 Sybil attacks browser. For the HTML only variant one would The Sybil attack is an attack where a single have to use HTTPS to prevent man-in-the- attacker represents itself as millions of nodes middle attacks. in a peer-to-peer system. Abusing this, the attacker is able to propagate false assumptions 3.2 Low-resource routing attacks about the network to other nodes. First described by Douceur [13], he mathe- Another possible attack on the anonymity of matically proves that this attack is always pos- Tor is the so called low-resource routing at- sible without a central authority that certifies tack [6]. With this attack, it is possible for the participating nodes in one way or another. an adversary to perform an end-to-end traffic The exception to this rule is what he calls ”ex- analysis with minimal resources, thus compro- treme and unrealistic assumptions of resource mising the anonymity Tor provides. The idea parity and coordination among entities”, or of this attack is that a malicious onion router in other words: require all participants to do can lie about its bandwidth, thus advertising something expensive (in terms of resources) to a much higher bandwidth than it actually has. identify themselves. This must be done within Because of the Tor path selection algorithm that a small enough time frame, so that an attacker prefers high-bandwidth nodes, the chance that can’t do this in sequence, but all nodes must a malicious entry and exit node are chosen is do them in parallel. high. A fully distributed network that implements Once a malicious entry and exit node have such a solution is the Bitcoin network [23], in been chosen, an analysis of the traffic can be which computing power, and not the number done to link onion proxies with the web servers of nodes is important for the general network they communicate with. Experimental research consensus. in a test setting has shown that with a total of 66 non-malicious and 6 malicious nodes, is pos- sible to compromise 46% of the built circuits. At 3.4 Replay attacks the request of the Tor community, this attack A replay attack [28] happens when a malicious hasn’t been tested on the live Tor network. entry node duplicates cells and sends them There are some extensions and improvements again. Since Tor uses the counter mode of to this attack. It is possible to perform a denial Advanced Encryption Standard (AES-CTR) for encryption and decryption, the counter will be The second proposal is using monetary wrong when the duplicated package arrives schemes. While it is not exemplified a lot, the causing the circuit to be destroyed. main idea is to use a virtual currency as incen- Using this, an accomplice exit router can, in tive. The problems with this approach are the cooperation with the entry node, discover the scalability and the hidden costs of this service. sender and receiver’s relationship. This attack The third proposal is Reciprocity-Based can also be used as a denial of service attack. Schemes. Using this approach, a peer main- According to Pries et al., defending against this tains a behaviour history of other peers in attack is quite challenging and requires further the network. These schemes can be based on research. two somewhat reciprocities: direct reciprocity or indirect reciprocity. The former are more suitable for longer relationships between peers. 4 PROBLEMSWITHDECENTRALIZING The latter is more scalable but they rely on third Decentralization is a difficult research prob- party and must handle trust issues themselves. lem. A trusted central authority simplifies boot- strapping, key management and user reputa- 4.2 NAT traversal tion. If one of these authorities were taken control of, anonymity could be compromised. A truly decentralized system requires the par- When decentralizing a central authority, its ticipating nodes to have direct connection to functionality needs to be dispersed across the each other. Because of the limited availability of peers in the network. In this section we will IPv4 network addresses, most consumer grade explain the various problems involving Tor internet connections only provide one network decentralization. address per subscriber, shared by all the de- vices connected to the subscribers network us- ing Network Address Translation (NAT). With 4.1 Incentives in decentralized systems IPv4 network addresses getting more scarce, Tax evasion and environmental pollution can some Internet Service Providers even put more be seen as forms of free-riding, a phenomenon than one subscribers behind a single network that is prominent in Tor. People predominantly address, using a carrier-grade NAT. use more Tor bandwidth than they donate, A NAT-based system works by creating a see section 2.5. There are multiple proposals local, private network which a NAT-enabled [12], [18] to introduce incentives into Tor, all router connects to the internet. The local net- have failed. If one would also have to build an work uses network addresses from the private incentive system into a decentralized system, ranges (e.g. 10.0.0.0/8 or 192.168.0.0/16). When they would have to find a way to manage the a local device sends a packet to the internet, ratings of each client in the network, in such the router replaces these private addresses, in- a way that they cannot be falsely modified. cluding the source ports, with its own public In other words, the reputation data has to be address before forwarding it to the internet. It accurate and reliable. Besides the integrity of saves these translations in a local table. Once this data, the traffic it generates on the network it receives packets, it looks in this table and should have minimal impact on the overall replaces the public network address and port performance. with the associated private network address Rahman [30] proposes several options to and port. If no corresponding entry exists in build incentives in a peer-to-peer network. The the routers table, the packet is dropped. This first proposal that is described is the so called means that when contacting a device behind a Warm-glow Model. This model determines the NAT, there must be an existing entry in this percentage of free-rides based on the proba- table. bilistic population distribution. If the percent- There are several techniques to add an entry age is above a certain threshold, the system will to the routers NAT table. Universal Plug and show signs of diminishing marginal returns. Play (UPnP) is one of those techniques, where the local device uses an HTTP request to the for more efficient key exchange methods. One router to associate a port with the devices of them is ACE, an one-way authenticated key private network address. This technique is not exchange protocol. The authors of this methods available everywhere and sometimes consid- claim to have a 46% efficiency improvement on ered a security risk. Different implementations the side of the client and nearly 19% on the of NAT require different techniques, such as side of the onion routers. ACE requires clients hole punching, relaying or reversal, as de- to send one extra element in the key exchange. scribed by Wocker et al. [34]. This does not introduce any overhead however, because the element fits in the unused space in 4.3 Bootstrapping new nodes a cell. If a Tor user decides to donate some of his bandwidth by running a bridge or a relay 5 DECENTRALIZED and thus creating a new node in the network, PRIVACY-ENHANCINGSYSTEMS there has to be a starting point where this new Fully decentralized systems with large scale node can discover neighbours in the network usage are without exception based on the peer- to connect with. In Tor, a directory server can to-peer paradigm. Many such systems have tell the new node what his neighbours are and been proposed, yet only some have been imple- where to find them [10]. mented and are currently in use and actively Moving this system to a peer-to-peer base is maintained [22], [31], [3], [33], [24], [14], [32], difficult: Dingledine et al. stated that this is in- [4]. Here we focus on fully decentralized net- deed still an open problem. With decentralized works with the exception of Torsk (which is systems, there is no central directory server to almost fully decentralized but still requires a tell a new node where to locate neighbours. neighbourhood authority). Some systems such as Tarzan, MorphMix and Pastry [33], [31] are decentralized but they do 5.1 Gnutella suffer from performance issues. Gnutella is a decentralized peer-to-peer net- work used for distributed search of files. Since 4.4 Key exchange the network is fully decentralized, peers in the With a decentralized network, using a central- network are called servents, a combination of ized authority for managing the keys is not the words servers and clients. Each peer can possible. This means that for secure commu- act both as a server, answering queries, or as a nication, peers have to exchange the keys di- client, requesting and executing search queries. rectly with each other, without a trusted party In order for a client to bootstrap, a new peer between them. Diffie-Hellman is a very pop- connects to one of several known hosts that are ular algorithm for exchanging keys between almost always available 2. Once the peer has two parties. It is used in the circuit creation joined, there are several a servent can in Onion Routing (see section 2.4) for exam- send out: ple. However, it is possible for an adversary • A peer sends a PING message to its neigh- to manipulate the keys exchanged between bours to announce its presence. This mes- two parties, making the protocol vulnerable sage is forwarded to other peers and each to a man-in-the-middle attack. This weakness peer sends a PONG message back. makes it possible for an adversary to decrypt • The peer can issue QUERY answers. Other all messages sent between the two parties. peers responds with a QUERY RESPONSE Tor currently uses an interactive forward- message to specify whether the file that secret key-exchange protocol called the Tor Au- was issued in the query, was found or not. thentication Protocol (TAP) [5]. This protocol • To transfer items between peers, the GET uses telescoping, which means that the initiator and PUSH messages are being used. negotiates session keys with each successive hop in the circuit. There are several proposals 2. These peers can be found on http://gnutellahosts.com. Gnutella is an unstructured network which Like Gnutella, peers can take the role of a means that the placement of data items is client, issuing requests, and the role of a server not based on any knowledge of the network where objects are stored. A peer can also func- topology nor the contents of the file. To search tion as a router, which forwards an incoming for a file, a flooding algorithm is used. message. The routing algorithm is based on the destination ID of the packet. Routers are 5.2 Freenet using local routing maps to route messages to the destination ID digit by digit. The routing In Freenet, each data item is represented by system ensures that each peer in the system a key that is independent of the location of can be found in a logarithmic amount of hops. the file. Freenet is called a loosely structured Tapestry is a fundamental component of network because of this. To issue a query, the OceanStore, a decentralized storage system. request is passed from client to client where Tapestry is also used in systems such as Bayeux each client makes a decision about the location and SpamWatch, a decentralized spam-filtering to send the request next. system. There are three types of file keys in Freenet: the first one is called the Keyword-Signed Key (KSK) which is derived from a short descrip- 5.4 Pastry tion of the file. Another key is the Signed- Pastry is very similar to Tapestry, but there are Subspace key (SSK) which enables personal some small differences. One of these differences namespaces. This key contains a public and a is the handling of network locality and data private key. The private key is used to store the object replication. Pastry also uses the Plaxton data and the public key is used in the queries mesh data structure for the routing algorithm. for the file. The third type of key is the Content- Each peer in the network gets assigned a ran- Hash Key (CHK) which is used for updating dom 128-bit identifier that is uniformly sam- and splitting of contents. pled from the key space. Each node can be The routing algorithm for storing and re- found in about log(n) steps. trieving data is dynamic and can adjust to the The Pastry overlay network is used in several topology of the network. Each peer only has applications, such as Scribe, Squirrel and PAST. knowledge about his neighbours. Each request Scribe is a system that has been built to send has a Hops-To-Live timer which indicates how multicast messages. Instead of relying on the many peers the request may traverse. Each multicast infrastructure, multicast messages are peer decrements the timer by one and when sent using only unicast services. Pastry is used the timer reaches zero, the request isn’t for- to create and manage multicast groups. Scribe warded any more. Results of queries are being makes use of the organization, robustness and cached in intermediate nodes to reduce the reliability of the Pastry network. time for a query response. To prevent looping Squirrel is a decentralized peer-to-peer web of the requests, each request contains a random cache. The network uses Pastry to locate its identifier. The peers that the request travels objects and for the routing algorithm. Squirrel through, keep track of these identifiers and allows users to share its web cache with other rejects the request if the request has already users in the network, creating a large decentral- been answered by the peer. ized web cache. Squirrel however introduces some overhead when searching the cache. The 5.3 Tapestry challenge is to keep this overhead as low as Tapestry is based on the Plaxton mesh data possible. structure, which maintains pointers to nodes in PAST is a large scale persistent peer-to-peer the network whose IDs match the elements of network that has been designed to store files. a tree-like structure of ID prefixes up to a digit It is built upon the Pastry network and the position. A property of Tapestry is that it offers main focus of PAST is providing performance, load distribution and routing locality. scalability and security. 5.5 MorphMix message delivery, anonymous channels and MorphMix [31] functions similar to both Tor secure pseudonyms. Users are able to send and MIX networks. It relies on nested encryp- and receive unicast, multicast and anycast mes- tion and routing traffic over multiple nodes to sages anonymously. The strategy that AP3 is ensure anonymity of its users’ communication. using for message delivery is similar to that Additionally, MorphMix uses the typical be- of Tarzan: it relies on a network of peers to haviour of a MIX network, where it reorders forward messages. A node along the request messages that enter a node before sending path, does not know whether the node from them out. which it receives a message is the message’s MIX networks are typically high latency and originator or simply another forwarding peer. traffic in it moves slow. Messages will have to be stored on a node until enough messages 5.7 Tarzan have arrived to start sending them out in a Tarzan [14] is a fully distributed peer-to-peer random order. Often, cover traffic is used to anonymity network. It implements a network generate enough messages for the network to address translator (NAT) to bridge between obfuscate the real communication, which in nodes running Tarzan and the internet. This turn generates a lot of bandwidth overhead. means that services don’t have to be aware of MorphMix has been developed as a low latency the fact they are running through Tarzan. and high performance network. As such, it will Tarzan requires knowledge of a few exist- not hold messages for very long, nor will it use ing nodes to bootstrap and uses a gossiping cover traffic. protocol to discover other nodes. Nambiar et In contrast to Tor, MorphMix does not fea- al. showed however that this does not scale ture a centralized node discovery mechanism. beyond roughly 10,000 nodes [25], [24]. Instead, every node is free to chose a set of Once Tarzan has knowledge of enough next nodes that will be the continuation of an nodes, it achieves its anonymity with much like anonymous tunnel. A malicious node could se- a Chaumian mix, with layered encryption and lect a colluding node to continue the tunnel and routing through multiple hops. In contrast to therefore control the entire tunnel. To prevent Tor and other networks, Tarzan uses cover traf- this, a witness node is appointed to mediate the fic to provide protection against traffic analysis setup of an anonymous tunnel. Although this by a global advisory to find an initiator. makes the possibility of such an attack more difficult, it doesn’t make it impossible. 5.8 An important part of MorphMix is the ability to detect malicious tunnels. Tunnels that are set Tribler is a social-based peer-to-peer file shar- up by well behaving nodes will select the next ing system backwards compatible with the Bit- nodes in the tunnel randomly. Malicious col- Torrent protocol [27]. Tribler considers social luding nodes, however, will specifically select phenomena and the sense of community as nodes that are part of the malicious network important parts of file sharing. Although the to continue the tunnel. This will reveal itself system is completely decentralized, it does not in the fact that the probability of the selection yet provide its users with anonymity. However, of certain nodes is increased. By using this a modified Tor-like protocol for decentralized information it becomes possible, to an extent, use is currently in beta. to detect malicious tunnels. Tribler introduces a novel protocol called BuddyCast. Peer and content discovery use this protocol, which disseminates information 5.6 AP3 epidemically. Additionally, the protocol allows AP3 (Anonymizing Peer-to-Peer Proxy) [22] users to find taste buddies, which are peers makes cooperative, decentralized anonymous that share similar interests in files. This enables communication possible. The AP3 system pro- quick finding of content that a user is interested vides clients with three primitives: anonymous in and builds on the idea of social phenomena. Although not an anonymization network, node discovery traffic with roughly 1.2 million Tribler does accomplish a lot on the topic of clients. decentralization. The BuddyCast protocol uses Instead of the directory servers, it uses a BitTorrent infohashes to spread information DHT and a new neighbourhood authority. The completely decentralized throughout the net- DHT is a combination of DHT and work. With future work on anonymization, this Myrmic DHT. Kademlia DHT was chosen be- can be a promising approach to anonymous file cause it is already widely used and it has sharing. proven itself for a large number of users. The Myrmic DHT runs on top of Kademlia, and introduces the neighbourhood authority. 5.9 NISAN This authority issues certificates to nodes that NISAN, or Network Information Service for participate in the DHT, but it does not par- Anonymization Networks [3], is an anonymiza- ticipate in the DHT itself. The neighbourhood tion network which implements a distributed authority makes this solution not a fully decen- node discovery. Not only does a central node tralized one, but its role is a lot smaller than the administrator (the directory server in Tor) im- current directory servers. This does not solve ply trust in those servers, Panchenko et al. also the trust issue, but it does solve the scalability argue that a central node administration (the issue. directory server in Tor) does not scale. The cur- rent directory server protocol was already im- proved two times to reduce bandwidth costs, 5.11 Comparison with a fairly low amount of users. Using the properties of each of the previously NISAN implements a DHT-based approach described networks, we can now draw a com- (Kademlia) for distributing node information, parison between these networks. This is done in such a way that does not require the client to in the form of a tabular overview, see table 1. know about all the nodes in the network (such The networks are compared according to the as in Tor). To build a circuit, NISAN generates following features: random IDs and searches for the closest hit • Compatibility with Tor: Is the network throughout the network. This makes it possible compatible with Tor? Could the network to build a path with nodes picked in a random, or some features of the network be used uniform way among all nodes, without the for the decentralization of Tor? trust of a third party. • Public implementation: Does a publicly This does not fully protect against finger- available implementation exist? printing or bridging attacks (passive attacks), • Used in practice: Is the network used in and suggest to do random walks throughout practice? the network to mitigate that. The authors admit • Attack resistance: What weaknesses does however this decreases the protection against the network have and which attacks are an active attack. possible? • Unlinkability: Does the network hide the 5.10 Torsk identities of the sender and/or receiver? Torsk [21] is an extension to Tor, designed to be an interoperable replacement for the circuit 6 THEDOCUMENTEDSTRUGGLEOF creation and directory servers as used by Tor. ALTERNATIVE INTERNET PROJECTS The authors argue that the current directory servers do not scale, with the percentage of The largest repository of decentralization at- the traffic in a network dedicated to node dis- tempts is located at redecentralize.org. The aim covery growing as the number of nodes grow. of this repository is to ‘get decentralized prod- With the 2009 version of Tor, they argue that ucts into the hands of billions’[29]. One of the 100% of the networks traffic would consist of ways they do this, is by maintaining a Github aim to provide unlinkability (Freenet, Tor and GNUnet), plus one that is currently in the pro- cess of implementing such a feature (Tribler). Furthermore we notice that, once above a cer- tain threshold, none of the statistics have a clear effect on a projects popularity. For example, Public implementation Used in practice (D)DoS protection Tor interoperability Sybil attack protection Unlinkability Name Year MITM protection Freenet and Tor have similar statistics, while Gnutella 2000 x X X x x x x the number of Tor users [36] is several orders of Freenet 2001 x X X X x x X magnitude higher than the number of Freenet Tapestry 2001 x X X X x x x Pastry 2001 x X X X x x x users [8]. MorphMix 2002 x X x X x ? X So although clear differences aren’t directly AP3 2004 x x x ? ? ? X noticeable, we hope that this comparison will x x x Tarzan 2002 ? X ? X provide more insight into which systems are Tribler 2008 x X X X x x x NISAN 2009 x x x ? ? X X more serious and mature than others, while Torsk 2009 X x x X X X X also showing which ones are still actively main- TABLE 1: A comparison of decentralized peer- tained. to-peer overlay networks. 7 CONCLUSION We explained how the leading privacy- repository3 with projects that in some way help enhancing technology Tor works and which to decentralize the internet. The struggle and components define a Tor network. Further- pains of these projects illustrates the difficulty more, we looked at the main disadvantages of decentralization. No projects succeeded in and problems Tor is currently facing. We in- creating an alternative internet infrastructure. vestigated the issues around decentralization The projects range from self hosted cloud ap- and compared systems that currently have a plications, to crypto currencies, to anonymous decentralized structure and/or mechanism. networks. From table 1, we conclude that there is no We made a significant contribution to this fully decentralized system capable of offering project index. We created a table containing Tor anonymity today. Decentralized systems each of the listed systems that shows statistics that do exist such as Tarzan, , Torsk or such as the total lines of code (LOC), age, Gnutella, show promising attempts to decen- number of contributors and commits, activity tralize and anonymize the internet. Yet each of and (main) programming language, sourced these systems either lacks in performance or is from Ohloh. This should make it easier to filter vulnerable to some type of attack. out poorly maintained or otherwise deprecated For the first time we document in detail, projects. The table is available on the same the amount of wasted effort and pain spent Github repository. in decentralization. The current generation of An excerpt of the table is included as table 2. technology lead by Tor still has room for im- In this instance, we chose to sort the table on provement, while the next generation is only the number of commits, because we found that just appearing on the horizon. The major prob- this most accurately represents both the matu- lems involving decentralization are excruciat- rity and activity of the projects. Other statistics ingly difficult to overcome. None of the projects such as LOC might not be very relevant on have succeeded in making the internet secure their own, because some projects include big and private. libraries or other projects in their repository. In the table we notice that three of the entries are privacy-enhancing networks that REFERENCES [1] Timothy G Abbott, Katherine J Lai, Michael Lieberman, 3. Found at ://github.com/redecentralize/alternative- and Eric C Price. Browser-based attacks on tor. In Privacy internet Enhancing Technologies, pages 184–199. Springer, 2007. # Name Language Age Last activity LOC Commits Contributors 1 ownCloud PHP 6 years 2014-03-15 1,297 K 34,391 392 2 Freenet Java 13 years 2014-03-15 442 K 32,009 183 3 Tor C 12 years 2014-03-13 329 K 29,200 184 4 GNUnet C 8 years 2014-03-13 427 K 21,398 37 5 StatusNet PHP 6 years 2014-03-14 230 K 14,877 92 6 Diaspora* Ruby 3 years 2014-03-14 51 K 14,202 368 7 SlapOS Python 8 years 2014-03-15 588 K 14,046 93 8 Tahoe-LAFS Python 7 years 2014-03-13 158 K 11,565 54 9 Tribler Python 8 years 2014-03-15 148 K 11,521 48 10 Lorea PHP 6 years 2014-03-12 873 K 11,066 96 TABLE 2: The top 10 projects from the Alternative Internet repository by number of commits (as of 2014-03-15).

[2] Mashael AlSabah and . Pctcp: per-circuit [16] IETF. The diffie-hellman key agreement method. Avail- tcp-over-ipsec transport for anonymous communication able at https://www.ietf.org/rfc/rfc2631.txt. overlay networks. In Proceedings of the 2013 ACM SIGSAC [17] IETF. The (tls) protocol. Available conference on Computer & communications security, pages at http://tools.ietf.org/html/rfc5246. 349–360. ACM, 2013. [18] Rob Jansen, Aaron Johnson, and Paul Syverson. Lira: [3] Arne Rache Andriy Panchenko, Stefan Richter. In NISAN: Lightweight incentivized routing for anonymity. Network Information Service for Anonymization Networks, [19] Eng Keong Lua, Jon Crowcroft, Marcelo Pias, Ravi 2006. Sharma, Steven Lim, et al. A survey and comparison of [4] Stephanos Androutsellis-Theotokis and Diomidis Spinel- peer-to-peer overlay network schemes. IEEE Communica- lis. A survey of peer-to-peer content distribution tech- tions Surveys and Tutorials, 7(1-4):72–93, 2005. nologies. ACM Computing Surveys (CSUR), 36(4):335–371, [20] Damon McCoy, Kevin Bauer, Dirk Grunwald, Tadayoshi 2004. Kohno, and Douglas Sicker. Shining light in dark places: [5] Michael Backes, Aniket Kate, and Esfandiar Mohammadi. Understanding the tor network. In Privacy Enhancing Ace: an efficient key-exchange protocol for onion routing. Technologies, pages 63–76. Springer, 2008. In Proceedings of the 2012 ACM workshop on Privacy in the [21] Jon McLachlan, Andrew Tran, Nicholas Hopper, and electronic society, pages 55–64. ACM, 2012. Yongdae Kim. Scalable onion routing with torsk. In [6] Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Proceedings of the 16th ACM conference on Computer and Kohno, and Douglas Sicker. Low-resource routing attacks communications security, pages 590–599. ACM, 2009. against tor. In Proceedings of the 2007 ACM workshop on [22] Alan Mislove, Gaurav Oberoi, Ansley Post, Charles Reis, Privacy in electronic society, pages 11–20. ACM, 2007. Peter Druschel, and Dan S Wallach. Ap3: Cooperative, [7] Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Wein- decentralized anonymous communication. In Proceedings mann. Content and popularity analysis of tor hidden of the 11th workshop on ACM SIGOPS European workshop, services. arXiv preprint arXiv:1308.6768, 2013. page 30. ACM, 2004. [23] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash [8] Generated by operhiem1 using pyProbe. Freenet statistics. system. Consulted, 1:2012, 2008. Available at http://asksteved.com/stats/. [24] Arjun Nambiar and Matthew Wright. Salsa: a structured [9] David L Chaum. Untraceable electronic mail, return approach to large-scale anonymity. In Proceedings of addresses, and digital pseudonyms. Communications of the the 13th ACM conference on Computer and communications ACM, 24(2):84–90, 1981. security, pages 17–26. ACM, 2006. [10] , Nick Mathewson, and Paul Syverson. [25] Andriy Panchenko, Stefan Richter, and Arne Rache. Tor: The second-generation onion router. Technical report, Nisan: network information service for anonymization DTIC Document, 2004. networks. In Proceedings of the 16th ACM conference [11] Roger Dingledine and Steven J Murdoch. Performance on Computer and communications security, pages 141–150. improvements on tor or, why tor is slow and ACM, 2009. what we’re going to do about it. Online: [26] Charles Poladian. Silk road shut down by fbi, owner http://www.torproject.org/press/presskit/2009-03-11- ross william ulbricht, ’ pirate roberts,’ arrested. performance.pdf, 2009. Available at http://www.ibtimes.com/silk-road-shut- [12] Roger Dingledine, Dan S Wallach, et al. Building incen- down-fbi-owner-ross-william-ulbricht-dread-pirate- tives into tor. In Financial Cryptography and Data Security, roberts-arrested-1413966. pages 238–256. Springer, 2010. [27] Johan A Pouwelse, Pawel Garbacki, Jun Wang, Arno [13] John R Douceur. The sybil attack. In Peer-to-peer Systems, Bakker, Jie Yang, Alexandru Iosup, Dick HJ Epema, Mar- pages 251–260. Springer, 2002. cel Reinders, Maarten R Van Steen, and Henk J Sips. [14] Michael J Freedman and Robert Morris. Tarzan: A peer-to- Tribler: a social-based peer-to-peer system. Concurrency peer anonymizing network layer. In Proceedings of the 9th and Computation: Practice and Experience, 20(2):127–138, ACM conference on Computer and communications security, 2008. pages 193–206. ACM, 2002. [28] Ryan Pries, Wei Yu, Xinwen Fu, and Wei Zhao. A new [15] David M Goldschlag, Michael G Reed, and Paul F Syver- replay attack against anonymous communication net- son. Hiding routing information. In Information Hiding, works. In Communications, 2008. ICC’08. IEEE International pages 137–150. Springer, 1996. Conference on, pages 1578–1582. IEEE, 2008. [29] Redecentralize Project. About redecentralize.org. Avail- able at http://redecentralize.org/about/. [30] Muntasir Raihan Rahman. A survey of incentive mecha- nisms in peer-to-peer systems, 2009. [31] Marc Rennhard and Bernhard Plattner. Introducing morphmix: peer-to-peer based anonymous internet usage with collusion detection. In Proceedings of the 2002 ACM workshop on Privacy in the Electronic Society, pages 91–102. ACM, 2002. [32] Matei Ripeanu. Peer-to-peer architecture case study: Gnutella network. In Peer-to-Peer Computing, 2001. Pro- ceedings. First International Conference on, pages 99–100. IEEE, 2001. [33] Antony Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Middleware 2001, pages 329–350. Springer, 2001. [34] Arno Wacker, Gregor Schiele, Sebastian Holzapfel, and Torben Weis. A mechanism for peer-to-peer networks. In Peer-to-Peer Computing, pages 81–83, 2008. [35] Tao Wang, Kevin Bauer, Clara Forero, and Ian Goldberg. Congestion-aware path selection for tor. In Financial Cryptography and Data Security, pages 98–113. Springer, 2012. [36] Tor Metrics Project website. Tor project: Anonimity online. Available at https://metrics.torproject.org. [37] Tor Project website. Tor project: Anonimity online. Avail- able at http://torproject.org.