University of Nevada, Reno

Anonymous Communication Systems: Usage Analysis and Attack Mechanisms

A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science

by

Esra Erdin

Dr. Mehmet Hadi Gunes/Thesis Advisor

May, 2012

THE GRADUATE SCHOOL

We recommend that the thesis prepared under our supervision by

ESRA ERDIN

entitled

Anonymous Communication Systems: Usage Analysis And Attack Mechanisms

be accepted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

Mehmet Hadi Gunes, Ph. D., Advisor

George Bebis, Ph. D., Committee Member

Murat Yuksel, Ph. D., Committee Member

Cahit Evrensel, Ph. D., Graduate School Representative

Marsha H. Read, Ph. D., Dean, Graduate School

May, 2012

i

Abstract

Privacy in communication has been an essential security requirement with the rapid growth of . This has resulted in growing interest in methods for anonymous communication. Several system designs have been developed with the common aim of preserving communication privacy within the shared network environment. In this study, the structure of these systems is explained and an analysis of current technologies; proxy servers, remailers, JAP, and , is presented. Because Tor is the largest anonymity system currently in use, a detailed analysis is also presented. Moreover, growing interest in anonymity systems resulted in several de-anonymization techniques developed to exploit these anonymous communication systems. These techniques are also explained to better understand the security threats of anonymous communication. ii

Acknowledgments

I would like to express my deepest gratitude to my advisor Dr. Mehmet Hadi Gunes for his guidance, understanding and patience he showed during this study.

I also would like to thank to my colleagues BingDong Li and Christopher Zachor for their effort and help to put together this project.

Finally, I could never have been able to finish this thesis without the endless support and love of my family. With all my sincerity, I would like to thank to my family.

This work was supported in part by National Institute of Justice with grant number 2010-DN-BX-K248.

Esra Erdin

University of Nevada, Reno May 2012 iii

Contents

Abstract i

Acknowledgments ii

List of Figures v

Chapter 1 Introduction 1

Chapter2 AnonymizationTechnologyUsage 6

2.1 ProxyServer...... 6 2.2 Remailer...... 8

2.3 MixNetwork ...... 11 2.4 OnionRouting ...... 13

2.4.1 TheOnionRouting...... 14 2.4.2 InvisibleInternetProject...... 21

Chapter 3 Tor Usage Analysis 24

3.1 RelatedWork ...... 24 3.2 TorMeasurementStudy ...... 25

Chapter 4 Application Perspective 30 iv Chapter 5 Deanonymization 33

5.1 Clickjacking ...... 33 5.2 CodeInjection...... 36

5.3 WebsiteFingerprinting ...... 36 5.4 ActiveDocuments...... 37

5.4.1 PdfDocuments ...... 37 5.4.2 MacroWordAttacks ...... 38 5.5 URIMethods ...... 40

5.6 Network-LevelAttacks ...... 42 5.6.1 TimingAttack ...... 42

5.6.2 PredecessorAttack ...... 43 5.6.3 IntersectionAttack ...... 43

5.6.4 MultiplicationAttack...... 43 5.6.5 CircuitClogging ...... 44

Chapter6 ConclusionsandFutureWork 45

Bibliography 47 v

List of Figures

2.1 ProxyServer...... 7

2.2 Geographic Proxy Distribution (log-scale) ...... 9 2.3 Geographic Proxy Distribution (cont.) (log-scale) ...... 10

2.4 Remailer Geo-Distribution (log-scale) ...... 11 2.5 AMixProcess...... 12

2.6 JAPGeo-Distribution(logscale) ...... 13 2.7 TheOnionRouter(Tor)communication ...... 15 2.8 Geographic Tor Distribution (log-scale) ...... 17

2.9 Geographic Tor Server Distribution (cont.) (log-scale)...... 18 2.10 Geographic Tor Bandwidth Conribution (log-scale) ...... 19

2.11 Geographic Tor Bandwidth Conribution (cont.) (log-scale) ...... 20 2.12 Geographic I2P Server Distribution (log-scale) ...... 22

2.13 Geographic I2PBandwidth Contribution (log-scale) ...... 23

3.1 TorUsage(log-scale) ...... 27 3.2 GeographicTORServerDistribution ...... 28

3.3 GeographicTORUserDistribution ...... 28 3.4 Version Distribution (log-scale) ...... 29 vi 4.1 CombinedSpamData ...... 32

5.1 A link on the website that provides clickjacking ...... 34 5.2 JavaScriptCode...... 35

5.3 FacebookLikepage...... 36 5.4 CreatingButtoninPdfFile ...... 38

5.5 FeaturesofButtonField ...... 39 5.6 EmbeddedVideoFile...... 40 5.7 FeaturesofEmbeddingOperation ...... 41 1

Chapter 1

Introduction

Privacy in communication has been an essential security requirement with the rapid growth of the Internet. Each time when we visit a website or send an , our client sends packets of data through the Internet that contain essential information like who is sending the message and who is the receiver of the packet. Although the data transmitting via several hops from source to the destination may be encrypted,

IP header is still visible to an observer. Adversaries can access significant information about the traffic carried between the sender and the receiver of the data. By observing communication through a link, third parties can easily identify who is communicating with whom based on source and destination IP addresses. The term anonymizer refers to varios tools that help the users to keep their activities private. Many situations like , anticensorship issues caused a growing interest in anonymous communication through the Internet. There are different situations in which someone wants to communicate anonymously on the Internet. Some organizations such as governments, military services may want some topics to remain secret as they are so critical and significant and their exposure may 2 be inconvenient or harmful to an investigation. Law enforcement agencies may want

to design online forms that people can provide hint or other information they may have without the fear of retribution or punishment. Even private citizens may want

to browse the Web without advertisers collecting statistics on their personal browsing habits. There are also people living under oppressive regimes that try to limit the

rights of their citizens especially on the Internet [17]. Citizens of oppressive countries can express their thoughts, ideas through anonymizer networks. Because of all the situations mentioned above, researchers design anonymity

systems that build an overlay network running on top of the Internet. By using anonymizer systems, a user can communicate with another one without revealing

their identities like IP addresses and location. Since milestone research on anonymous communications Chaum [12], research on these systems has extended to many areas. Anonymous communication, traf- fic analysis, provable shuffles, anonymous , anonymous publications, private information retrieval, taxonomy, security and improvement, communication censor- ship, and anonymous voting are some of the areas that researchers have focused on [3,14,17,27]. Several anonymity services are provided by either commercial com- panies or open source developers. Anonymizer.com is one of the commercial websites that requires its users to pay a subscription fee to keep their identity private while browsing web via Anonymizer.com servers. Moreover, Tor Onion (TOR) [16]

and Invisible Internet Project (I2P) [4] are open source systems which are hosted by volunteers.

A communication network is composed of messages, senders, that send mes- sages, and recipients, that receive messages. Sender and recipient can be any client,

server or peer in a communication network. These subjects exchange messages via 3 public communication channels. There is also a possibility to be an attacker who may

be interested in monitoring the traffic transmitted between sender and recipient. The attacker can be outside of the communication network, or inside of the communica-

tion network. In order to better understand anonymous communication systems, it is essential to understand the terms anonymity, unlinkability, and unobservability.

Anonymity: In their proposal Pfitzman and Hansen describe anonymity as ¨the state of being not identifiable within a set of subjects, the anonymity set” [37]. All subjects that can cause an action constitute the anonymity set. In other words, a message can possibly be sent by any of the senders who belong to his/her sender anonymity set and, similarly, a message can be received by any recipient who might be anonymous within a set of recipient anonymity set. Here, we can consider a subject as a client, server or peer in a network. All subjects might have their own anonymity sets. Unlinkability: A subject of the system can make multiple uses of a resource without an adversar being able to link to these uses to the particular subject [41].

In other words, an adversary observing the senders and recipients of a network is unable to identify the communicating parties and the ability of understanding the communication between joined parties does not increase by observing the network. The relation between anonymity and unlinkability is that anonymity of an item in a system is not linkable to any particular identity and the anonymity of a particular identity is not linkable to any item [41]. More clearly, any particular message sent through a network is not linkable to a particular sender in the sender set which ensures sender anonymity. Unobservability: Observers cannot observe the subjects of interest from any other subjects in the communication network. As anonymity sets, unobservability sets 4 of subjects are defined with respect to unobservability [37]. Sender unonservability

assures that it is not noticable whether any sender within the unobservability set sends the message. Similarly, recipient unonservability assures that it is not noticable

whether any recipient within the unobservability set receives the message. As indicated earlier, there has been considerable research effort on providing

anonymous communications; however, not much information is covered on the current utilization of deployed anonymity systems. It is crucial to understand the deployment and usage of anonymous networks. In this thesis, I cover the analysis and usage

of current anonymizer technologies, including; proxy servers, remailers, JAP mix network, TOR, and I2P.

Additionaly, I analyze Tor usage in detail, as it is currently the largest anonymity system in use. Two data sets are used for Tor analysis: data collected by setting up a Tor relay as an exit node, and historical Tor measurements reported at Tor project website [7]. Moreover, the usage of anonymity technology from the application perspective; specifically spam-email is studied. For this, IP addresses of spam-emails are compared to the collected IP addresses of the anonymity servers during our studies.

Growing interest in anonymity systems has increased the attention to develop several attack systems to deanonymize these systems. Finally, various mechanisms that can be used to identify a user of an anonymizer network including ; clickjacking, website fingerprinting, active documents, URI methods, network-level attacks are discussed.

Main contributions of this thesis is that, in addition to overviewing existing anonymous communication systems, it presents the analysis of these systems and gives a detailed analysis of Tor network, which is the most popular network among 5 all. Usage and server geo-location distribution of these systems are studied in de- tail. Moreover, the usage of anonymity technology from the application perspective; specifically spam-email, is studied which has not been seen in the literature previ- ously. We also proved the applicability of several de-anonymization mechanisms to accessing the users of anonymizer systems. 6

Chapter 2

Anonymization Technology Usage

Anonymous communication designs can be categorized in several categories based on network type, protocol type, anonymity properties, or adversary capability. However, generally, they are often divided into two categories: high-latency systems and low- latency systems. High-latency anonymity systems provide strong anonymity. These systems are mostly used for non-interactive applications that can tolerate high delays while communicating. On the other hand, low-latency systems often provide better performance as the delays of the communication is low. They are suitable for real- time applications like instant messaging and web browsing. The analysis of popular deployed anonymity systems will be provided in this section.

2.1 Proxy Server

A proxy server simply forwards all incoming traffic without any packet reordering, acting as an intermediary for requests from clients [17]. A client connects to the proxy server requesting some service. The proxy server evaluates the request and relays the requests user to its intended destination as the proxy is requesting and relays the 7 responses to the user as in Figure 2.1. Significant improvement in the performance of

the anonymous communication can be gained by using a single proxy. However, the proxy server knows about both the source and destination which allows it to trace

the traffic between the source and destination. The improved performance of these systems can also increase the systems’ susceptibility to certain traffic analysis attacks

by third parties [17].

1 2

4 3

Source Proxy Server Destination

Figure 2.1: Proxy Server

Web proxies are the most popular among all other proxy servers. Although there are many volunteer-based proxies, some commercial companies also offer proxy service such as Anonymizer.com and GoTrusted.com. Clients pay a subscription fee and then can relay their traffic through the servers operated by the company.

Figure 2.2 and 2.3 present the geographic location of 7,246 public proxy servers obtained on March 10-15, 2011. These proxies were collected from major announce- ment lists (i.e. anonymousproxylists.net, freeproxy.ru, fresh-proxy-list.net, free-proxy-server-list.com, multiproxy.org, nntime.com, proxies4u.com, proxynet.org, proxy4free.com, proxy.org, proxylistservice.com,tech-faq.com,

proxyserverprivacy.com, www.samair.ru, and publicproxyservers.com,) and are a sample of available public proxy systems. Hence, this is not a complete list of public

proxy servers but a representative sample for the purpose of this study. Note that, the figure is in logarithmic scale. Among the available public proxy servers from 112 8 countries, most were located in the U.S. (i.e., 1,869) and in China (i.e., 710). More-

over, 46 countries hosted more than 10 public proxy servers while 21 hosted a single server.

2.2 Remailer

Remailers are the servers that receive messages with several instructions embedded into it and forward the messages to the destination without revealing any information of the sender. In particular, they remove the identifying email headers before sending

the message to the destination. The first widespread public implementation of remailers is remail- ers [24,36]. These are also called Type I remailers. These remailers forward messages between several systems before sending to the actual destination. Later, Mixmaster systems, which are also called Type II remailers, are implemented to be more resistant to traffic analysis [13]. These remailers include message , message pools and some other features to prevent emails to be monitored by an adversary [41]. Finally,

Type III remailer, called , was proposed in [15]. Free routing algorithm makes Type III remailers more secure compared to other ones. Every email transmit- ting through the connection passes several mixes and no single mix can see any other mix besides the adjacent mixes. Hence, no mix can link the message sender with any recipient. 9

Number of Proxy Servers

10 100 1000 10000

United States China Brazil Indonesia United Kingdom Canada Germany Russian Federation India Japan France South Korea Taiwan Colombia Thailand Hong Kong Iran Ukraine Poland Spain Argentina Netherlands Egypt Venezuela Kazakhstan Czech Republic Turkey Chile South Africa Bulgaria Kenya Vietnam Philippines Saudi Arabia Peru Latvia Mexico Italy Malaysia Slovakia Ecuador Sweden Ireland Pakistan Hungary Australia

10 100 1000 10000 Figure 2.2: Geographic Proxy Distribution (log-scale) 10

Number of Proxy Servers

1 10

Denmark Austria Singapore Israel Portugal Georgia Panama Greece Romania Finland Ghana Bangladesh Bolivia Switzerland Angola Nigeria Moldova Belgium Norway Dominican Republic United Arab Emirates Morocco New Zealand Paraguay Guatemala El Salvador Croatia Albania Puerto Rico Mauritius Kuwait Bosnia and Herzegovina Yemen Zimbabwe Costa Rica Belarus Sri Lanka Lebanon Lithuania Slovenia Madagascar Cuba Sudan Benin Iraq

1 10 Figure 2.3: Geographic Proxy Distribution (cont.) (log-scale) Number of Remailers 11 During our web/blog search in Oct 1 10 2010, we could only identify 15 active re- United States mailers as shown in Figure 2.4. Due to Germany heavy usage of remailers by spammers or Netherlands Austria scammers, they are no more actively de- United Kingdom ployed. Canada Australia

2.3 Mix Network 1 10

Chaums’s seminal work in [12] was a base Figure 2.4: Remailer Geo-Distribution system for secure mix systems. Mix sys- (log-scale) tems use mix servers to relay the mes- sages to the destination. The principal

idea is that all the messages that is desired to be anonymized relays through these mix nodes. We can consider mix networks as public- cryptographic primitive that takes

a number of ciphertexts as input encypted with the public keys of the mix nodes. Each mix has its own RSA public key, and messages are encrypted with these public keys

after divided into blocks. The first mostly includes the address of the other mix which acts as a header of the message. When the first mix node gets the message, it decrypts all the blocks with its public key, extract the first block of remaining blocks

which includes the address of the recipient and appends a block of random bits at the end of the message. The random bit block is added to the message to make the

size of the data fixed. Figure 2.5 shows the working mechanism of mix networks. In the figure, path

P of a message M consists of 3 mix process Mix-1, Mix-2, and Mix-3. The client 12 Mix 1 Mix 2 Mix 3

M M M M

Figure 2.5: A Mix Process

builds ciphertext C by encrypting message M along with random text R using each

mix’s public key K. The ciphertext (e.g., E1(AMix−2,R1+E2(AMix−3,R2+E3(D,R3+

M)))) specifies the exact path the message will take through the mix network. Each mix node (e.g., Mix-1) receives the ciphertext decodes one layer to find next hop destination (e.g. AMix−2) and forwards payload (e.g., E2(AMix−3,R2 + E3(D,R3 +

M))). Decyrption of ciphertexts and padding functionality are essential for mix net- works as they aim to provide bitwise unlinkability. A link between the encrypted message arriving at the mix node and decrypted message departing from the mix should not be related by an attacker of the system. Flushing algorithms are also used in mix networks to enhance anonymity of users. Flushing algorithms determine which message will be forwarded to the other mix node, and when to flush. The al- gorithm buffer incoming messages into a pool and forward messages in rounds. This also provides unlinkability and defends against traffic analysis attacks. 13 There are different variations of mix net- Number of Jap Servers works that were deployed over the time such as 1 10 Crowds [40], Tarzan [19], and JAP [9]. Germany JAP () is a low-latency United States mix cascade that uses servers provided by vol- Netherlands unteers. The clients run JAP software to send United Kingdom the packets to the destination, mix cascades within JAP network forwards the traffic coming 1 10 from the client. JAP cascades encrypted pack- ets through several mixes and keep the traffic in Figure 2.6: JAP Geo-Distribution a constant rate to avoid rate-based traffic anal- (log scale) ysis. The JAP program displays active mixes and users are able to select JAP cascades from those active mixes. Figure 2.6 presents the geographic location distribution of 11 JAP servers that were active on 12-19 Oct 2010. Compared to based systems Tor and I2P, JAP seems to have minimal usage at the time of our analysis.

2.4 Onion Routing

Onion routing was first developed by Reed, Syverson, and Goldschlag [21,22,39]. It is the most prevelant design for low-latency anonymous communication [21]. The structure of these systems is composed of layer upon layer of which encap- sulates the message, like an onion. The basic idea of onion routing is similar to the structure of mix cascades which is analogous to message transmission with multiple envelopes constructed by a sender, with the address of the first mix to be the outer- most envelope. However; there is such a difference that, onion routing clients do not 14 necessarily forward the traffic for other clients. In onion routing mechanism, there is

a set of servers called onion routers that relay the traffic for clients [17]. Like in mix networks, each node keeps a public and private key. The public key

is known by the client to be able to set the path of the communication. The first step a client does in an onion routing system is to construct an encrypted tunnel, which is

called circuit, through the network by using public-key cryptography. After setting up the circuit, symmetric key cryptography is used to transfer the data. Another big difference between mix networks and onion routing is that mix networks use flushing

algorithms which may cause to keep the packet for an amount of time while waiting to receive adequate number of packets to mix together. The onion routing work real-time

mechanism and relays the traffic as it arrives to the node. Although this decreases the delay time in onion routing, it increases the vulnerability of these systems to possible

attack mechanisms like timing analysis attacks [26]. There are different variations of onion routers such as The Onion Router (Tor) [16], and Invisible Internet Project (I2P) [4]. These systems differ based on how the routing servers are organized; how the encryption algorithms are applied; how the tunnels are established; whether the transport-layer protocol uses TCP or

UDP; or whether the clients relay traffic to other clients or not.

2.4.1 The Onion Routing

Tor, shown in Figure 2.7, is the most popular anonymity system, based on the num- ber of users, as it combines the best parts of the previous methods (e.g., the directory discovery of routing servers for clients, telescopic circuit establishment, and location hiding). Directory servers are responsible for distributing signed information about known relays in the network [17]. Authoritative directory servers, currently 7 systems 15 trusted by Tor developers [23], are utilized to determine three-hop paths among volun- teer servers using secured TCP connections. User messages are then encrypted using symmetric key encryption with a telescoping key establishment scheme. Encrypted messages forwarded through the established circuit to the dedicated exit router, which forwards the message to the final destination and relays replies back to the source.

To reduce latency, onion routers neither add intentional delays nor batch messages. Hence, they are vulnerable to timing analysis attacks in general [26]. Entrance and exit nodes are particularly important as they know the source and the destination of the communication, respectively. Hence, to protect client profiling, authoritative directory servers pick only a subset of existing systems, which seems to be reliable, to become entry nodes. Moreover, packets originate from the exit system from the destination’s perspective and may be questioned regarding user actions. Hence, Tor provides an option of not becoming an exit node in case the user does not want to deal with possible complaints.

Directory Server

Onion Proxy

Entry Node

Middle Node Exit Node

Figure 2.7: The Onion Router (Tor) communication 16 Figure 2.8 and 2.9 presents a snapshot of Tor servers based on their geographic

location during Oct 20, 2010 - March 15 2011. The values for each country represent the daily average number of servers with standard deviation value for that country.

For this analysis, we monitored the authoritative directory servers to determine the total number and geographical location of Tor servers. During the sampling period,

we identified 61,798 unique servers at 103 countries while Tor system has approxi- mately 1,500 active public relays at a given time. Most of Tor relays are located in few countries. Similar to earlier studies [11,33], United States and Germany had highest

number of volunteers and consequently higher daily average number of servers. Con- sidering continents, which are indicated with different lines and colors, Europe had

the highest number of servers. We also present the bandwidth contribution of each country in Figure 2.10 and 2.11. Similar to the geographic server distribution graphs, we calculated daily average bandwidth and standard deviation values for each country. Similar to the number of servers, United States and Germany has the highest bandwidth. Moreover, we observe that a considerable number of countries have a significant variance in contributed bandwidth while the number of servers is stable. 17 Number of Tor Servers

1 10 100 1000

United States Germany France Netherlands Russian Federation Sweden United Kingdom Canada Italy Austria Czech Republic Switzerland Poland Ukraine Finland Japan Australia Denmark Spain Luxemburg Norway Israel Belgium Brazil Hong Kong Singapore Ireland Bulgaria Portugal Taiwan Hungary America Malaysia Europe Argentina Asia Australia 1 10 100 1000

Figure 2.8: Geographic Tor Server Distribution (log-scale) 18 Number of Tor Servers

0.1 1 10

Greece Thailand Estonia Slovakia South Korea India Romania New Zealand Egypt Turkey Iran Mexico Slovenia Lithuania Croatia Belarus South Africa China Saudi Arabia Latvia Costa Rica Panama Chile VietNam Colombia United Arab Emirates Indonesia Serbia Barbados Moldova Isle of Man Venezuela America Europe Mongolia Asia Cote d’Ivoire Africa Australia 0.1 1 10

Figure 2.9: Geographic Tor Server Distribution (cont.) (log-scale) 19 Bandwidth Value (KB/sec)

0.1 1 10 100 1000

United States Germany Netherlands France Sweden United Kingdom Russian Federation Iran Canada Luxemburg Taiwan Switzerland Denmark Czech Republic Austria Slovakia Chile Poland Italy Ukraine Israel Hong Kong Finland Portugal Japan Hungary Croatia Ireland Cote d’Ivoire America Singapore Europe South Korea Asia Africa 0.1 1 10 100 1000

Figure 2.10: Geographic Tor Bandwidth Conribution (log-scale) 20 Bandwidth Value (KB/sec)

0.01 0.1 1 10

Australia Greece Spain Bulgaria Norway Latvia Malaysia New Zealand India Isle of Man Thailand Lithuania Argentina Romania Egypt Belgium Estonia Vietnam Mongolia Slovenia Turkey Brazil South Africa Mexico Indonesia Moldova Venezuela Belarus Costa Rica America Europe Panama Asia Saudi Arabia Africa Australia 0.01 0.1 1 10

Figure 2.11: Geographic Tor Bandwidth Conribution (cont.) (log-scale) 21 2.4.2 Invisible Internet Project

Invisible Internet Project (I2P) is another system offering anonymization services that identity-sensitive applications can use. The I2P network is strictly message based (i.e.,

UDP) but there are libraries that allow reliable streaming communication on top of the I2P network. I2P works by routing traffic through other peers and mainly is a

closed system. Many applications can interact with I2P including mail, peer-to-peer, and IRC chat. However, as I2P does not focus on end-to-end delay it is not preferred for low latency applications.

Similar to Tor, I2P announces its peers. To analyze I2P usage, we collected active I2P relays by joining the system during Oct 11, 2010 - March 15, 2011. Fig- ure 2.12 presents the geographic distribution of servers in 44 countries 1 ordered by daily average number of servers. We also calculated standard deviation for each country to show the variance in the number of servers. Even though we had a longer

sampling of I2P, we observed fewer servers than the Tor system. Moreover, like to Tor, Germany, U.S. and France are the top three when ordered according to daily

average number of volunteers. Figure 2.13 shows the bandwidth of I2P servers volunteering for each country.

Similar to other figures, it is ordered by the daily average bandwidth of active servers we had monitored and also shows standard deviation for each country. We observe a considerable variation in both the number of servers and allocated bandwidth of I2P

relays. This indicates instability of the volunteering nodes which may hamper longer communications.

1Countries were looked up from ipinfodb.com database. 22 Number of I2P Servers

0.1 1 10 100

Germany France United States Russian Federation Sweden Ukraine Netherlands United Kingdom Denmark Finland Japan Canada Norway Luxemburg Estonia Switzerland China Austria Spain Australia Latvia Poland Czech Republic Brazil Singapore HongKong Bulgaria Taiwan Belgium Romania Serbia Ireland Argentina Hungary New Zealand Italy Puerto Rico Venezuela Malaysia Portugal Belarus Lithuania Israel Namibia America Europe Djiboti Asia Moldova Africa Turkey Australia 0.1 1 10 100

Figure 2.12: Geographic I2P Server Distribution (log-scale) 23 Bandwidth Value (KB/sec)

0.1 1 10 100 1000 10000

Germany France United States Russian Federation Sweden Ukraine Netherlands United Kingdom Finland Japan Canada Denmark Norway Estonia China Luxemburg Switzerland Spain Austria Australia Latvia Romania Brazil Bulgaria Argentina Poland Hungary Hong Kong Czech Republic Belgium Italy Djiboti Ireland Moldova Turkey Taiwan New Zealand Puerto Rico Serbia Portugal Namibia Malaysia Singapore Venezuela America Europe Israel Asia Lithuania Africa Belarus Australia 0.1 1 10 100 1000 10000

Figure 2.13: Geographic I2PBandwidth Contribution (log-scale) 24

Chapter 3

Tor Usage Analysis

In this chapter, we analyze Tor usage, as it is currently the largest anonymity system. We use two data sets for Tor usage analysis: data collected by setting up a Tor relay as an exit node, and historical Tor measurements reported at Tor project website [7].

3.1 Related Work

There have been some earlier studies that analyzed the Tor usage as it gained popu- larity [11,29,31–33]. One of the earliest studies by McCoy et al. analyzed how Tor is being used, how it is being mis-used, and who are its users [33]. In their experiments, the authors analyzed application-level protocols that use their nodes as exit node. According to their finding, interactive protocols, such as HTTP, make up 92% of the connections and consume 58% of bandwidth. Similarly, bit-torrent traffic consumes 40% of band- with even though it accounts for 3.3% of the connections. The authors also pointed to malicious usage of Tor routers and developed a method to detect malicious logging at exit routers. Moreover, they indicated that Tor has a global user base based on 25 its client distribution. Our results in Chapter 2 also indicate that Tor has the largest

volunteer base among the anonymity systems. Similarly, Chaabane et al. analyzed the applications that use Tor [11]. Authors monitored traffic on six Tor servers which were pairwise located in the U.S., Europe and Asia to inspect geo-diverse relays. Authors analyzed the HTTP and traffic in depth. They pointed out that BitTorent consumes significant resources both in terms of the number of packets and the carried traffic volume. Finally, authors pointed that some users take advantage of the Tor servers as 1-hop SOCKS proxies and presented a method to detect such use. Additionaly, Loesing observed the trends in the Tor network by analysing the public directory information [31]. The results of the study indicated that there are some problems in the Tor network that need to be addressed such as the efficient usage of bandwidth capacity, support for dynamic IP addresses, and upgrade of the Tor software version by relays. Finally, Loesing et al. provided guidelines for a statistical analysis of Tor data focusing on countries of connecting clients and exit traffic by port [32]. Pointing to pri- vacy issues, the authors derived guidelines for measuring sensitive data in anonymity networks. Moreover, they pointed to interesting cases such as the increase in the Tor usage by Iranian IP space in June 2009 after the Iranian elections; and Tor blocking by China and consequent increase in bridge usage by Chinese IP addresses.

3.2 Tor Measurement Study

In our analysis, first, with my colleague we set up one Tor exit node with the de- fault policy at a dedicated Windows server using Tor 0.2.3.4-alpha version during

September 5 - October 5 2011. We captured packet headers, i.e., IP addresses and 26 port numbers for both source and destination, and payload size. We used this data

to estimate the usage ratio of Tor among Internet users. As indicated earlier, user privacy were protected as no personal information was logged.

Figure 3.1 presents Tor usage of the top 50 countries along with the average number of Tor users and relay servers from these countries. We calculate the usage ra-

tio of Tor in each country as the ratio of the number of observed clients in the data set over the estimated number of the Internet users obtained from http://internetworld- stats.com. Then, we normalized the values over the average where a value of x in-

dicates x times the average usage. Interestingly, Italy, Ukraine and Finland had the highest ratios of Tor usage relative to its Internet users while U.S.A and Turkey are at the average. Next, we used historical Tor measurements reported at Tor project website [7] to analyze the evolution of Tor deployment. Loesing presents statistical analysis of the Tor system [31], but in this study we focused on Tor usage. Figure 3.2 and Figure 3.3 present the number of Tor users and relay servers for the top 10 countries.

We excluded other countries as they occluded the figures. We observe that since 2007 most relays are hosted in the United States and Germany. They also have the highest number of clients connecting to the Tor. However, considering Internet users, Tor usage is average for the United States. Moreover, South Korea and Iran have 3rd and 8th, respectively, highest Tor users while they are not among the top 10 Tor relays. 27 0.01 0.1 1 10 100 1000 10000

Italy Ukraine Finland Germany Switzerland Netherlands Austria Sweden Denmark Romania Belgium Ireland France Canada Estonia Russian Federatio Poland Hungary Portugal Czech Republic Bulgaria United States Turkey Croatia Iran Greece Taiwan Spain Lithuania Israel Australia Saudi Arabia Norway Thailand Argentina Slovakia United Arab Emirates China United Kingdom Pakistan Chile Nigeria Japan Korea Republic Serbia India Brazil Mexico Servers New Zealand Users Malaysia Usage Ratio 0.01 0.1 1 10 100 1000 10000

Figure 3.1: Tor Usage (log-scale) 28

700 700 600

600 500

400 500 300 400 200

300 100 Number of Servers 0 200

100

0 3/11 United States 12/10 Germany 9/10 6/10 France 3/10 Sweden 12/09 9/09 Netherlands 6/09 United Kingdom 3/09 12/08 Russian Federation 9/08 Date Canada 6/08 Italy 3/08 China 12/07

Figure 3.2: Geographic TOR Server Distribution

70000 70000 60000

60000 50000

40000 50000 30000 40000 20000

30000 10000 Number of Users 0 20000

10000

0 United States 3/11 Germany 12/10 South Korea 9/10 France China 6/10 Italy 3/10 Russian Federation 12/09 Date Iran 9/09 United Kingdom Japan 6/09

Figure 3.3: Geographic TOR User Distribution 29 Number of Clients

1 10 100 1000

0.2.2.35 0.2.2.34 0.2.1.30 0.2.3.10-alpha 0.2.1.29 0.2.2.33 0.2.1.31 0.2.2.32 0.2.1.26 0.2.3.9-alpha 0.2.3.8-alpha 0.2.1.19 0.2.3.7-alpha 0.2.1.32 0.2.1.22 0.2.0.34 0.2.1.28 0.2.1.25 0.2.3.10-alpha-dev 0.2.3.5-alpha 0.2.1.27 0.2.0.35 0.2.3.4-alpha 0.2.2.30-rc

Client Versions 0.2.2.27-beta 0.2.2.24-alpha 0.2.1.20 0.2.3.9-alpha-dev 0.2.3.1-alpha 0.2.2.29-beta 0.2.2.25-alpha 0.2.1.21 0.2.3.8-alpha-dev 0.2.3.6-alpha 0.2.2.28-beta 0.2.2.23-alpha 0.2.2.20-alpha 0.2.2.15-alpha 0.2.2.13-alpha 0.2.2.10-alpha 0.2.0.32 0.2.0.30

1 10 100 1000

Figure 3.4: Client Version Distribution (log-scale)

Finally, we analyzed the versions of Tor clients on June 23, 2011. Figure 3.4 presents the top 25 client versions of the day. As seen in the figure, majority of clients are the most recent stable versions but there are considerable number of earlier versions, some of which has known security vulnerabilities. Likewise, there are many early adapters of the alpha/beta versions of the client. 30

Chapter 4

Application Perspective

In this section, the usage of anonymity technology from the application perspective; specifically spam-emails is studied. For this, we compared the observed IP addresses of applications to the collected IP addresses of the anonymity servers discussed in

Chapter 2. Spam email data was collected using two approaches. First, we trimmed the IP addresses of spam emails collected from the department during Oct 2010 to Novem- ber 2011. Then, we gathered publicly available spam email IP addresses of recent spammers from the Internet. An important issue was to obtain recent data sets as anonymizer server IP addresses change over the time (except for the commercial systems).

We collected 58,969 unique IP addresses of spam e-mails from departmental mail servers during Oct 2010 to November 2011. In this data, 154 IP addresses belonged to anonymity servers indicating that the spam e-mail was sent through an anonymity network. Figure 4.1-(a) presents the distribution of the utilized anonymizer technology. In this data set, proxies were utilized as spam relays more than the other 31 systems. Moreover, Figure 4.1-(e) indicates the geo-location of relays of spam servers,

including the public data sets. China and United States are the main relays of spam emails in the local data set.

We also collected IP addresses of publicly available data sets (156,371 IP from address lists) that were recently marked as spam generators by public systems in- cluding spam-ip.com, okean.com, projecthoneypot.org, spam-ip-list.com, and blogspot.com. Among these IP addresses, 1,845 belonged to an anonymity network. We analyzed the distribution of utilized anonymizer technology and the server geo- location for spammer IP addresses separately for each data set, and combined all data set from these web sites. Figure 4.1-(b)-(c) represent data sets from spam-ip.com and okean.com, respectively. While most spams reported by spam-ip.com utilized prox- ies, the spams listed by okean.com utilized Tor.

Moreover, Figure 4.1-e presents the distribution of utilized anonymizer technol- ogy and the server geo-location for spammer IP addresses in combining all data from these web sites together with local data set. Germany, China, Canada and U.S. were

the four major relay nodes for spammers among the 100 countries observed and ac- count for 66% of all servers. Proxy servers and Tor servers were utilized the most and

account for 96.5% . Interestingly, the commercial anonymizer system goTrusted.com was utilized by spammers to send spam e-mails. According to the figure we can conclude that, overall Tor, proxies and I2P are

the three major sources utilized by spammers to relay e-mails about 76.2%, 20.2%, 3.3% respectively and in total 99.7%. Moreover, servers in Germany, China, Canada,

and U.S. were the main exit nodes of spam e-mails about 49.3%, 7.9%, 4.5%, and 4.4% respectively and in total 66.1%. 32 Number of spam-emails

1 10 100 1000 1 4 5 Proxies Germany United States TOR China 47 Algeria Gotrusted Brazil Australia 97 I2P Others Canada Anonymizer Russia Japan Belgium (a) Local spam data Italy United Kingdom 8 Indonesia France 73 Korea Sweden Proxies Nigeria TOR Ukraine GoTrusted Thailand South Africa India 721 Netherlands Taiwan Poland (b) SpamIP.com Bulgaria Philippines Czech Republic 2 Colombia 49 Chile Iran Proxies Somalia Malaysia TOR Hong Kong Argentina I2P Switzerland Slovakia 460 Venezuela Turkey Vietnam (c) okean.com Serbia Saudi Arabia Romania 12 7 1 Latvia Proxies Kazakhstan Ireland TOR Spain 610 Pakistan Gotrusted Mexico 1010 I2P Luxembourg Greece Ghana All Spam Data Anonymizer Local Spam Data Denmark Spamip.com Angola Okean.com (d) Combined spam data 1 10 100 1000 (e) Geo-location Figure 4.1: Combined Spam Data 33

Chapter 5

Deanonymization

Growing interest in anonymity systems has increased the attention to develop several attack systems to deanonymize these systems. In this chapter, we analyze various mechanisms that can be used to identify a user of an anonymizer network. In all of these attacks, a global adversar might modify and monitor flow through a subset of anonymizer nodes, so that, the attacker identifies certain anonymous users.

5.1 Clickjacking

Clickjacking is a malicious technique of tricking a web user to click on a button or a link on a different page from what the user perceives he/she is clicking on. Attacker uses transparent layers to achieve this attack. By this method the control of the computer of the user can be taken by the attacker by routing the user to another page owned by another application. An adversary can also direct the request of the user to a malicious server using this method and can get information about the user. For example, imagine an attacker who builds a web site that has a button on it that says ‘click here for a free iPod’. However, on top of that web page, the 34 attacker has loaded an iframe with your mail account, and lined up exactly the ‘delete

all messages’ button directly on top of the ‘free iPod’ button. The victim tries to click on the ‘free iPod’ button but instead actually clicked on the invisible ‘delete all

messages’ button.

Figure 5.1: A link on the website that provides clickjacking

An explotation of this attack can be done by abusing Facebook’s ‘Like’ and Google’s ‘+1’ functionality. We implemented a JavaScript code which seeks to make a user automatically ‘Like’ a unique link on their Facebook profile via clicking on any 35 link on the website. All we need to do is to include the JavaScript code to the webpage and create a link on the website as seen in the Figure 5.1. A piece of JavaScript code is showed in Figure 5.2. When the user clicks on the link, JavaScript embedded in the webpage automatically runs and makes the user automatically ‘Like’ the website as seen in Figure 5.3. The assumption with this attack is that the user is already signed into his/her Facebook account.

Figure 5.2: JavaScript Code

The same attack can be launched with Google+. A JavaScript code, which automatically makes the user ‘+1’ the website is embedded in the website. With the user’s clicking on any point on the website, JavaScript code runs immediately making the user ‘+1’ the website without the permission of the user. 36 In either case, malicious website can provide a unique link to each anonymous visitor so that they are identified.

Figure 5.3: Facebook Like page

5.2 Code Injection

Code injection is the exploration of a computer bug that is caused by processing invalid data. Code injection technique can be used to get the history and cookies of the web browser by injecting HTML or script to the web browser of the client. An adversary may achieve to get essential information of the user by executing malicious script through a vulnerable website. Moreover, in anonymizer systems, an exit node can inject HTML tags, scripts or flash as a component of the relayed webpage that can help in identifying the user [43]. In either case, a new connection to a predetermined server is initiated when the user runs the injected component on his/her client and the attacker can easily obtain the identity of the user.

5.3 Website Fingerprinting

Although the data sent through anonymous systems is encrypted, an attacker can still identify who is communicating with whom by observing how much data is transmitted through the network [25]. If the attacker has a list of guesses as to which site is 37 targeted, by observing the quantity and length of data packets received in response, he/she can build a fingerprint of what the web site’s response traffic will look like. By observing target user’s traffic, attacker can compare observed pattern against stored

fingerprint database to identify possible end points. Most modern anonymity systems split messages into equal sized packets to prevent such attacks.

5.4 Active Documents

A webserver owner can identify the clients connecting to that server even if they are using anonymous networks by running active documents on his/her system.

5.4.1 Pdf Documents

Acrobat forms allow one to embed a JavaScript in the Pdf file and can define fields in pdf documents. By getting Field object of these fields, one can trigger JavaScript using these objects and perform the desired action. By using this feature, one can create fields; button in this example, as in Figure 5.4. The user can define desired functions for these buttons by giving directions while defining these fields. In this ex- ample, we create a ‘Submit’ button, and give the command ‘Open a Web Link’ as can be seen in Figure 5.5. When the user clicks on ‘Submit’ button, the embedded action runs and directs the user to specified website, ‘www.google.com’ in this example. Many actions can be defined for these fields, such as ‘Open a file’ , ‘Play a sound’ , ‘Execute a menu item’. The user is unaware of what will actually happen if he/she clicks on the button which might be named anything. By using this feature an attacker can direct the user coming to his/her website to any other page he/she wants and may infer user identity through the new connection that bypasses proxy settings. 38

Figure 5.4: Creating Button in Pdf File

Newer versions of Adobe Acrobat have the functionality to embed flash, sound, video files as can be seen in Figure 5.6. An adversary can attack the users visiting a website, by inserting a pdf file to the website which plays a streaming video that can be located in any server. Figure 5.7 shows the Insert Flash tool in pdf file. The user initiates the connection with the server by accessing the pdf file. However, after starting to play the video, the video hosting server continues to communicate with the user by bypassing established proxy tunnel. Again a webpage can direct the user to any malicious server that may be able to identify the identity and reveal location of the user.

5.4.2 Macro Word Attacks

Macro is a series of commands that are recorded so that they can be played or executed later automatically. Many applications, such as Microsoft Word and Excel, support macro languages. A macro virus attacks these kinds of applications, infecting the 39

Figure 5.5: Features of Button Field

documents and template files. Hence, if your computer has a macro virus, and you use a word template file to make a new document, it will also become infected by the virus. One of the common ways, a macros virus can harm your computer is by

replacing the normal functioning macros, and causing a series of automatic actions 40

Figure 5.6: Embedded Video File

that destructs your files. These macros can also be used to send user identity to a malicious server. In either case, active documents bypass proxy settings and establish a new connection that reveals user identity.

5.5 URI Methods

Uniform Resource Identifier (URI), is the generic term for all types of names and ad- dresses referring to objects on the (WWW). It can be defined as the identification of an object enabling interaction with representations of the resource over WWW. Browsers have URI handling, which enables developers to pass applica- 41

Figure 5.7: Features of Embedding Operation

tion commands through the browser and install software on the user’s computer. It can be used in many applications, such as establishing a skype call to a person directly from a website. Because many browser implementations do not properly sanitize the data passed via the URIs, with quotes and null bytes, it is possible to circumvent the browsers’ ability to handle the incoming data and slip out of the sandbox. Hence, an attacker may be able to access the local file system of the victim. Skype [5] is one of the applications that has released a security bulletin to address a remote vulnerability [6]. This vulnerability was due to an error in the 42 handling of “file:” URIs. By convincing a user to click on a specially crafted “file:”

URI, a remote, unauthenticated attacker may be able to execute arbitrary code on victim’s system. This code can help reveal user identity.

Similar to Skype, Apple QuickTime [1] is also prone to a vulnerability that allows remote attackers to launch arbitrary applications and access files [2]. The issue arises because of improper handling of the “file:” URI. The data is passed to ‘.dll!FileProtocolHander’ and can result in the processing of non-HTTP filetypes, if the file type in a URI supplied via the “qt:next” attribute isn’t recognized by the application. By supplying malicious QuickTime content to be processed by a vulnerable user, an attacker can exploit this vulnerability. The attacker can execute arbitrary malicious codes on the victim’s computer and may gain unauthorized remote access control.

In either case, the attacker can run ipconfig or ifconfig to obtain anonymous user’s real IP address and transmit it over the anonymizer network.

5.6 Network-Level Attacks

There are several attacks that a global adversary, who is able to monitor some portion of the anonymizer network, launch against anonymizer networks.

5.6.1 Timing Attack

An adversary who is able to observe the connections entering and exiting the anonymity network can link inputs and outputs based on packet inter-arrival times. Because low- latency anonymity systems like onion routing do not introduce any delay or alter the timing characteristics of an anonymous system, they are mostly vulnerable to such timing attacks [17,28,35,42,46]. 43 5.6.2 Predecessor Attack

The attacker tracks a stream of communications over a number of rounds. In each round, the attacker simply logs any node that sends a message that is part of the tracked stream. By this mechanism the attacker may be able to compromise the initiator of the communication [45].

5.6.3 Intersection Attack

By watching the communication between each user and cascades’ first mix, an attacker can see when the user is active and when he/she is idle or off-line. The attacker also watches all messages that leave the cascade. Some of these messages can be linked to previous messages and recognized as belonging to a sessions. This reveals when a session is active or inactive [10]. Using these analysis, the attacker can zoom on certain users to identify targeted anonymous user.

5.6.4 Multiplication Attack

This attack is a variation of injection attack where an exit node modifies replied messages that would trigger several new requests. Then, entry nodes may identify the user using the pattern of requested objects [44]. An attacker running several nodes in an anonymizer network can also accomplish multiplication attack on the network if entry and exit nodes of his/her circuit belong to the same communication circuit which the client builds before sending the message. The exit node controlled by the attacker marks the cells with an identifier and forwards them to the anonymizer network. If the entry node of the circuit is also driven by the attacker, when the node gets the packet it recognizes the packet and may find who is communicating with whom [30]. Similarly, replaying a cell or dropping a cell helps identifying the flow between entry and exit node [20,38]. 44 5.6.5 Circuit Clogging

To detect anonymizer nodes that are active in a flow destined to the server, circuit clogging can be performed by the malicious [18,34]. By sending bursty traffic to the user, the server can monitor anonymizer nodes using a direct connection to itself. The anonymizer nodes that have higher delay during burst periods but not during silent times are the ones that provide the connection to the end user. This attack can also be done by a third party to observe the traffic between end users [8]. 45

Chapter 6

Conclusions and Future Work

Privacy in communication has been an essential security requirement of communica- tion. The rapid expansion of Internet, resulted in growing interest in methods for anonymous communication. Several system designs have been developed with the

common aim of preserving communication privacy within the shared network envi- ronment. In this thesis, I explained the structure of anonymizer systems and present

an analysis of the state of the current anonymity technologies including; proxy servers, remailers, JAP, Tor and I2P. According to the results, proxy servers and Tor are the most used anonymity systems, while remailers and JAP have minimal usage.

I also analyzed where each system’s infrastructure is located at and where each system’s users originate from. Although these systems have a worldwide usage, there are huge differences of usage ratios between different countries. For example, Germany and U.S. are among the top countries in terms of the number of servers and the number of users in all anonymizer systems. Moreover, the number of servers in these countries is extremely larger than the number of servers in a country that is listed towards the end of the list. 46 Additionally, because it is the largest anonymity system currently in use, I

presented a detailed analysis of the Tor network. In the study, we observed that Internet users at European countries utilize the Tor network more than other countries

as evidenced by the top 13 highest usage being from Europe even though they are not the top countries hosting the Tor routers. For example, Italy has the highest ratio of

Tor usage even it is not among the top of the list of Tor server. We observe that, newer anonymity systems have higher popularity because of the increased security mechanisms of these systems. This fact also affects the earlier systems to become extinct as anonymity requires crowds of users. I also analyzed spam e-mails to observe the anonymity system usage by spam- mers. Although we do not observe much spam activity through anonymizer networks, it can be concluded that proxy servers are mostly used to send spam e-mails. More- over, Germany and USA are the top countries these spam e-mails originated from. Growing interest in anonymity systems also increased the attention to develop de-anonymization techniques. I also presented an overview of these techniques in- cluding clickjacking, code injection, website fingerprinting, active documents, URI methods, and network-level attacks.

The rapid development in anonymous systems is a sign of that they will have much wider area of applications in the future. A deeper analysis of these systems helps the Internet users seeking for privacy preserving methods for communication.

Because higher popularity of Tor network, it would be beneficial to conduct the application level analysis of Tor network. Also detailed analysis of de-anonymizing techniques helps anonymous system developers to understand and detect possible vulnerabilities that their system has. These systems can be improved as more attack models identified. 47

Bibliography

[1] Apple quicktime. www.apple.com/quicktime/.

[2] Apple quicktime ’file:’ uri file execution vulnerability. http://netscreen.com/security/auto/vulnerabilities/vuln29650.html.

[3] Free haven selected papers in anonymity. http://www.freehaven.net/anonbib/date.html.

[4] I2p anonymous network. www.i2p2.de.

[5] Skype. www.skype.com.

[6] Skype file uri security bypass code execution vulnerability. http://cyberinsecure.com/skype-file-uri-security-bypass-code-execution- vulnerability/.

[7] Tor metrics portal picked up. ://metrics.torproject.org.

[8] A. Back, U. M¨oller, and A. Stiglic. Traffic analysis attacks and trade-offs in anonymity providing systems. In I. S. Moskowitz, editor, Proceedings of In- formation Hiding Workshop (IH 2001), pages 245–257. Springer-Verlag, LNCS 2137, April 2001.

[9] O. Berthold, H. Federrath, and S. K˝opsell. Web mixes: A system for anonymous and unobservable internet access. In International Workshop on Designing Pri- vacy Enhancing Technologies, pages 115–129. Springer-Verlag New York, Inc., 2001.

[10] O. Berthold and H. Langos. Dummy traffic against long term intersection attacks. In R. Dingledine and P. Syverson, editors, Proceedings of Privacy Enhancing Technologies workshop (PET 2002). Springer-Verlag, LNCS 2482, April 2002.

[11] A. Chaabane, P. Manils, and M. Kaafar. Digging into anonymous traffic: A deep analysis of the tor anonymizing network. In Network and System Security (NSS), 2010 4th International Conference on, pages 167 –174, 09 2010. 48 [12] D. L. Chaum. Untraceable electronic mail, return addresses, and digital . Commun. ACM, 24(2):84–90, 1981.

[13]L. Cottrell. Mixmaster and remailer attacks. http://www.obscura.com/loki/remailer/remailer.essay.html.

[14] G. Danezis and C. Diaz. A survey of anonymous communication channels, 2008.

[15] G. Danezis, R. Dingledine, D. Hopwood, and N. Mathewson. Mixminion: Design of a type iii anonymous remailer protocol. In In Proceedings of the 2003 IEEE Symposium on Security and Privacy, pages 2–15, 2003.

[16] R. Dingledine, N. Mathewson, and P. Syverson. Tor: the second-generation onion router. In SSYM’04: Proceedings of the 13th conference on USENIX Security Symposium, pages 21–21, Berkeley, CA, USA, 2004. USENIX Association.

[17] M. Edman and B. Yener. On anonymity in an electronic society: A survey of anonymous communication systems. ACM Comput. Surv., 42(1):1–35, 2009.

[18] N. Evans, R. Dingledine, and C. Grothoff. A practical congestion attack on tor using long paths. In Proceedings of the 18th USENIX Security Symposium, August 2009.

[19] M. J. Freedman and R. Morris. Tarzan: a peer-to-peer anonymizing network layer. In CCS ’02: Proceedings of the 9th ACM conference on Computer and communications security, pages 193–206, New York, NY, USA, 2002. ACM.

[20] X. Fu and Z. Ling. One cell is enough to break tor’s anonymity. In Black Hat DC, 2009.

[21] D. Goldschlag, M. Reed, and P. Syverson. Onion routing. Communications of the ACM, 42:39–41, 1999.

[22] D. Goldschlag, M. Reed, and P. Syverson. Onion routing for anonymous and private internet connections. Communications of the ACM, 42:39–41, 1999.

[23] S. Hahn and K. Loesin. Privacy-preserving ways to estimate the number of tor users. Technical report, TOR project, November 2010.

[24] J. Helsingius. Anon.penet.fi release. http://www.penet.fi/press-english.html.

[25] A. Hintz. Fingerprinting websites using traffic analysis. In Workshop on Privacy Enhancing Technologies, 2002. 49 [26] A. Johnson, J. Feigenbaum, and P. Syverson. Preventing active timing attacks in low-latency anonymous communication. In Proceedings of the 10th Privacy Enhancing Technologies Symposium (PETS 2010), Berlin, Germany, July 2010.

[27] D. Kelly. A taxonomy for and analysis of anonymous communications networks. Technical report, Air Force Institute of Technology, March 2009.

[28] B. N. Levine, M. K. Reiter, C. Wang, and M. Wright. Timing attacks in low- latency mix systems (extended abstract). In Proceedings of the 8th International Financial Cryptography Conference (FC 2004), Key West, FL, USA, Febru- ary 2004, Volume 3110 of Lecture Notes in Computer Science, pages 251–265. Springer, 2004.

[29] B. Li, E. Erdin, M. H. Gunes, G. Bebis, and T. Shipley. An analysis of anonymity technology usage. In Proceedings of the Third international conference on Traf- fic monitoring and analysis, TMA’11, pages 108–121, Berlin, Heidelberg, 2011. Springer-Verlag.

[30] Z. Ling, J. Luo, W. Yu, X. Fu, D. Xuan, and W. Jia. A new cell counter based attack against tor. In CCS ’09: Proceedings of the 16th ACM conference on Computer and communications security, pages 578–589, New York, NY, USA, 2009. ACM.

[31] K. Loesing. Measuring the tor network from public directory information, 2009.

[32] K. Loesing, S. J. Murdoch, and R. Dingledine. A case study on measuring sta- tistical data in the tor anonymity network. In Workshop on Ethics in Research, Jan 2010.

[33] D. Mccoy, T. Kohno, and D. Sicker. Shining light in dark places: Understanding the tor network. In In Proceedings of the 8th Privacy Enhancing Technologies Symposium, 2008.

[34] S. J. Murdoch and G. Danezis. Low-cost traffic analysis of Tor. In Proceedings of the 2005 IEEE Symposium on Security and Privacy. IEEE CS, May 2005.

[35] S. J. Murdoch and P. Zieliski. Sampled traffic analysis by internet-exchange- level adversaries. In In Privacy Enhancing Technologies (PET), LNCS. Springer, 2007.

[36] S. Parekh. Prospects for remailers, First Monday, 1(2). http://www.firstmonday.dk/issues/issue2/remailers, 1996. 50 [37] A. Pfitzmann, T. Dresden, and M. Hansen. Anonymity, unlinkability, unde- tectability, unobservability, pseudonymity, and identity management - a consol- idated proposal for terminology, 2008.

[38] R. Pries, W. Yu, X. Fu, and W. Zhao. A new replay attack against anony- mous communication networks. In IEEE ICC Information and Network Security Symposium, 2008.

[39] M. G. Reed, P. F. Syverson, and D. M. Goldschlag. Anonymous connections and onion routing. IEEE Journal on Selected Areas in Communications, 16:482–494, 1998.

[40] M. K. Reiter and A. D. Rubin. Anonymous web transactions with crowds. Com- mun. ACM, 42(2):32–48, 1999.

[41] J. Ren and J. Wu. Survey on anonymous communications in computer networks. Computer Communications, 33(4):420–431, Mar. 2010.

[42] V. Shmatikov and M.-H. Wang. Timing analysis in low-latency mix networks: attacks and defenses. In IN: PROCEEDINGS OF ESORICS, pages 18–33, 2006.

[43] R. Singel. The onion router (tor) is leaky (leeky). Wired, October 2006.

[44] X. Wang, J. Luo, M. Yang, and Z. Ling. A novel flow multiplication attack against tor. In CSCWD ’09: Proceedings of the 2009 13th International Con- ference on Computer Supported Cooperative Work in Design, pages 686–691, Washington, DC, USA, 2009. IEEE Computer Society.

[45] M. K. Wright, M. Adler, B. N. Levine, and C. Shields. The predecessor attack: An analysis of a threat to anonymous communications systems. ACM Trans. Inf. Syst. Secur, 7:2004, 2004.

[46] Y. Zhu, X. Fu, B. Graham, R. Bettati, and W. Zhao. On flow correlation attacks and countermeasures in mix networks. In in Proceedings of Privacy Enhancing Technologies workshop, pages 26–28, 2004.